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(54) Title: REAL-TIME SEQUENCE DETERMINATION 



Primer strand: 




TOP 5* 


GOT ACT AAG CGG CCG CAT G 3' 


Tenplat* Strands: 




BOT-T 3' 


CCA TGA TTC GCC GGC GTA CTC 5' 


BOT-C 3' 


CCA TGA TTC GCC GGC GTA CCC 5* 


BOT-G 3' 


CCA TGA TTC GCC GGC GTA CGC 5' 


BOT-A 3' 


CCA TGA TTC GCC GGC GTA CAC 5' 


B0T-3T 3' 


CCA TGA TTC GCC GGC GTA CTT TC 5' 


BOT-Sau 3' 


CCA TGA TTC GCC GGC GTA CCT AQ 5* 



< 



r5 



Zacoxporata s 
(5* to 3>) 



JkO AAMS 




Lane 12 3 4 



O (57) Abstract: A sequencing methodology is disclosed that allows a single DNA or RNA molecule or portion thereof to be se- 
quenced directly and in substantially real time. The methodology involves engineering a polymerase and/or dNTPs with atomic 
^ and/or molecular tags that have a detectable property that is monitored by a detection system. 
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For two-leiier codes and other abbreviations, refer to the "Guid- 
ance Notes on Codes and Abbreviations" appearing at the begin- 
ning of each regular issue of the PCT Gazette. 
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PATENT SPKC][yiCATJlON 

TITLE: REAL-nME SEQUENCE DETERMINATION 

INVENTOR; Susan H. Hardin, James M, Briggs, Shiao-Chun Tu, Xiaolian Gao and Richard 
WiUson 

BACKGROUN D OF THE INVENTION 

1 , FIELD OF THE INVENTION 

The present invention relates to a single-molecule sequencing apparatus and methods. 

More particularly, the present invention relates to a single-molecule sequencing 
apparatus and methods using tagged polymerizing agents and/or tagged monomers where the 
tagged polymerizing agent and/or the tagged monomers undergo a change in a detectable 
property before, during and/or after monomer insertion into a growing polymer chain. The 
apparatus and methods are ideally-suited for sequencing DNA, RNA, polypeptide, 
carbohydrate or similar bio-molecular sequences under near real-time or real-time conditions. 
The present invention also relates to a single-molecule sequencing apparatus and methods 
using tagged depolymerizing agents and/or tagged depolymerizable polymer where the tagged 
depolymerizing agent and/or the tagged depolymerizable polymer undergo a change in a 
detectable property before, during and/or after monomer removal from the depolymerizable 
polymer chain. The apparatus and methods are ideally-suited for sequencing DNA, RNA, 
polypeptide, carbohydrate or similar bio-molecular sequences. The present invention also 
relates to detecting a signal evidencing interactions between the tagged polymerizing agent 
or depolymerizing agent and a tagged or untagged polymer subunit such as a monomer or 
collection of monomers, where the detected signal provides information about monomer 
order. In a preferred embodiment, the metiiods are carried out in real-time or near real-time. 

2. DESCRIPTION OF THE RELATED ART 
Overview of Conventional DNA Sequencing 

The development of methods that allow one to quickly and reliably determine the 
order of bases or 'sequence' in a fragment of DNA is a key technical advance, the importance 
of which cannot be overstated. Knowledge of DNA sequence enables a greater understanding 
of the molecular basis of life. DNA sequence information provides scientists with information 
critical to a wide range of biological processes. The order of bases m DNA specifies the order 
of bases in RNA, the molecule within the cell that directly encodes the informational content 
of proteins. DNA sequence information is routinely used to deduce protein sequence 
information. Base order dictates DNA structure and its function, and provides a molecular 
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program that can specify normal development, manifestation of a genetic disease, or cancer. 

Knowledge of DNA sequence and the ability to manipulate these sequences has 
accelerated development of biotechnology and led to the development of molecular 
techniques that provide the tools to ask and answer important scientific questions. TTie 

polymerase chainreaction(PCR),an important biotechnique that facilitates sequence-spec^ 
detection of nucleic acid, relies on sequence information. DNA sequencing methods allow 
scientists to determine vdiether a change has been introduced into the DNA. and to assay the 
effect of the change on the biology of the organism, regardless of the type of orgahism that 
is being studied. Ultimately. DNA sequence information may provide a way to uniquely 
identify individuals. 

In order to understand the DNA sequencing process, one must recall several facts 
about DNA. First, a DNA molecule is con^sed of four bases, adenine (A), guanine (G), 
cytosine (C). and thymine (T). These bases interact with each other in very specific ways 
through hydrogen bonds, such that A interacts with T, and G interacts with C. These specific 
interactions between the bases are rcfened to as base-pairings. In fact, it is these 
base-pairings (and base stacking interactions) that stabilize double-stranded DNA. The two 

strandsofaDNAmolecdeoccur in anantiparallel orientation, whereonestrandispositione^ 
in the 5- to 3* direction, and the other strand is positioned in the 3' to 5' direction. The tenns 
5' and 3- refer to the directionality of the DNA backbone, and are critical to describing the 
order of the bases. The convention for describing base order in a DNA sequence uses the 5' 

to 3'direction, and is written from leftto right. Thus, if one knows the sequence of oneDNA 
strand, the complementary sequence can be deduced. 

Sanger DNA Seaiienciw g fEnzvmafic Svnthftsis^ 

Sanger sequencing is currently the most commonly used method to sequence DNA 
(Sanger et al., 1977). This method exploits several features of a DNA polymerase: its ability 
to make an exact copy of a DNA molecule, its directionaUty of synthesis (5' to 3% its 
requirement of a DNA strand (a 'primer^ from which to begin synthesis, and its requirei^ent 
for a 3- OH at the end of the primer. If a 3 ' OH is not available, then the DNA strand camiot 
be extended by the polymerase. If a dideoxynucleotide (ddNTP; ddATP, ddTTP, ddGTP, 
ddCTP), a base analogue lacking a 3' OH, is added mto an enzymatic sequencing reaction,' 
it is incorporated into the growing strand by the polymerase. However, once the ddNTP is 
incorporated, the polymerase is unable to add any additional bases to the end of the strand. 
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Importantly, ddNTPs are incorporated by the polymerase into the DNA strand using the same 
base incorporation rules that dictate incorporation of natural nucleotides, where A specifies 
incorporation of T, and G specifies incorporation of C (and vice versa). 
Fluorescent DNA Sequencing 

A major advance in determining DNA sequence information occurred with the 
introduction of automated DNA sequencing machines (Smith et al., 1986). The automated 
sequencer is used to separate sequencing reaction products, detect and collect (via computer) 
the data from the reactions, and analyze the order of the bases to automatically deduce the 
base sequence of a DNA fragment. Automated sequencers detect extension products 
containing a fluorescent tag. Sequence read lengths obtained using an automated sequencer 
are dependent upon a variety of parameters, but typically range between SOO to 1,000 bases 
(3-18 hours of data collection). At maximum capacity an automated sequencer can collect 
data from 96 samples in parallel. 

When dye-labeled terminator chemistry is used to detect the sequencing products, 
base identity is determined by the color of the fluorescent tag attached to the ddNTP. After 
die reaction is assembled and processed through the appropriate number of cycles (3-12 
hours), the extension products are prepared for loading into a single lane on an automated 
sequencer (unincorporated, dye-labeled ddNTPs are removed and the reaction is concentrated; 
1-2 hours). An advantage of dye-terminator chemistry is fliat extension products are 
visualized only if they terminate with a dye-labeled ddNTP; prematurely terminated products 
are not detected. Thus, reduced background noise typically results with this chemistry. 

State-of-tihe-art dye-terminator chemistry uses four energy transfer fluorescent dyes 
(Rosenblum et al., 1 997). These termmators include a fluorescein donor dye (6-FAM) linked 
to one of four different dichlororhodamine (dRhodamine) acceptor dyes. The d-Rhodamine 
acceptor dyes associated with the terminators are dichloro[R110], dichloro[R6G], 
dichlorofTAMRA] or dichloro[ROX], for the G-, A-, T- or C-terminators, respectively. The 
donor dye (6-FAM) efficiently absorbs energy from the argon ion laser in the automated 
sequencing machine and transfers that energy to the linked acceptor dye. The linker 
cotmecting the donor and acceptor portions of the terminator is optimally spaced to achieve 
essentially 100% efficient energy transf^. The fluorescence signals emitted from these 
acceptor dyes exhibit minimal spectral ov&Asp and are collected by an ABI PRISM 377 DNA 
sequencer using 1 0 imi virtual filters centered at 540, 570, 595 and 625 nm, for G-, A-, T- or 
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C-tenninators, respectively. Thus, energy transfer dye-labeled teminatora produce brighter 
signals and improve spectral resolution. Hiese improvements result in more accurate DNA 
sequence information. 

The predominant enzyme used in automated DNA sequencing reactions is a 
genetically engineered fomi of DNA polymerase I from Themus aquaticus. ITus enzyme, 
AmpliTaq DNA Polymerase. FS. was optimized to more efficientiy incoipoiate ddNlPs and 
to eliminate tiie 3' to 5' and 5' to 3' exonuclease activities. Replacing a naturally occmring 
phenylalanine at position 667 in T. aquaticus DNA polymerase wifl, a tyrosine reduced the 
preferential incorporation of a dNTP, relative to a ddNTP (Tabor and Richardson. 1995- 
Reeve and FuUer. 1 995). Thus, a single hydroxyl group >viti)in the polymerase is responsible 
for discrimination between dNTPs and ddNTPs. The 3' to 5' exonuclease activity, which 
enables die polymerase to remove a mis-incorporated base fiom the newly replicat^ DNA 
strand (proofreading activity), was eliminated because it also allows the polymerase to 
remove an incorporated ddNTP. The 5" to 3' exonuclease activity was eliminated because it 
removes bases from tiie 5' end of die reaction products. Since the reaction products are size 
separated durir^g gel electrophoresis, mterpretable sequence data is only obtained if the 
reaction products share a common enc^int More specifically, the primer defines the 5' end 

oftiieextensionproduct and theincorporated, color-coded ddNTP defines base identity atdie 

3'endoftiiemolecule.Thus.conventionalDNAsequencing involves analysisofapopulation 
of DNA molecules sharing the same 5' endpoint. but differing in tire location of the ddNTP 
at the 3' end of the DNA chain. 
Genome S equencing 

Very often a researcher needs to determine the sequence of a DNA fragment that is 
larger tiran the 500-1.000 base average sequencing read length. Not surprisingly, strategies 

to accompUshtius have been developed. These strategies are divided into two major classes, 
random or directed, and strategy choice is influenced by the size of flie fragment to bi 
sequenced. 

In random or shotgun DNA sequenciiig, a large DNA fragment (typically one larger 
than 20.000 base pairs) is broken into smaUer fragments timt are inserted into a cloning 
vector. It is assumed that the sum of information contained witiun fliese smaUer clones is 

equivalent to tiiat contained witiiintixeoriginalDNA fragment Numerous smaUer clones are 
randomly selected. DNA templates are prepared for sequencing reactions, and primers tiiat 
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will base-pair with the vector DNA sequence bordering the insert are used to begin the 
sequencing reaction (2-7 days for a 20 kbp insert). Subsequently, the quality of each base call 
is examined (manually or automatically via software (PHRED, Ewing et al., 1998); 1-10 
minutes per sequence reaction), and the sequence of the original DNA fragment is 
reconstructed by computer assembly of the sequences obtained from the smaller DNA 
fragments. Based on the time estimates provided, if a shotgun sequencing strategy is used, 
a 20 kbp insert is expected to be completed in 3-1 0 days. This strategy was extensively used 
to determine the sequence of ordered fragments that represent the entire human genome 
(http://www.nhgri.nih.gov/HGP/). However, this random approach is typically not sufficient 
to complete sequence determination, since gaps in the sequence often remain after computer 
assembly. A directed strategy (described below) is usually used to complete the sequence 
project. 

A directed or primer-walking sequencing strategy can be used to fill-in gaps 
remaining after the random phase of large-fragment sequencing, and as an efficient approach 
for sequencing smaller DNA fragments. This strategy uses DNA primers that anneal to the 
template at a single site and act as a start site for chain elpngation. This approach requires 
knowledge of some sequence information to design the primer. The sequence obtained from 
the first reaction is used to design the primer for the next reaction and these steps are repeated 
until the complete sequence is determined. Thus, a primer-based strategy involves repeated 
sequencing steps from known into unknown DNA regions, the process minimizes 
redundancy, and it does not require additional cloning steps. However, this strategy requires 
the synthesis of a new primer for each round of sequencing. 

The necessity of designing and synthesizing new primers, coupled with the expense 
and the time required for their synthesis, has limited the routine application of primer-walking 
for sequencing large DNA fragments. Researchers have proposed using a library of short 
primers to eliminate the requirement for custom primer synthesis (Studier, 1 989; Siemieniak 
andSlightom, 1990;Kieleczawaetal., 1992;Kotleretal., 1993; Burbelo and ladarola, 1994; 
Hardin et al., 1996; Raja et al., 1997; Jones and Hardin, 1998a,b; Ball et al„ 1998; Mei and 
Hardin, 2000; Kraltcheva and Hardin, 2001). The availability of a pruner library minimizes 
primer waste, since each primer is used to prime multiple reactions, and allows immediate 
access to the next sequencing prim^. 

One of &e original goals of the Human Genome Project was to complete sequence 
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determination of the entire human genome by 2005 (http://www,nhgri.nih.gov/HGP/). 
However, the plan is ahead of schedule and a 'working draft' of the human genome was 
published in February 2001 (Venter er al., 2001, "International Human Genome Sequencing 
Consortium 2001"). Due to technological advances in several disciplines, the completed 
genome sequence is expected in 2003, two years ahead of schedule. Progress in all aspects 
involving DNA manipulation (especiaUy manipulation and propagation of large DNA 
fiagments), evolution of faster and better DNA sequencmg methods (http://www.abrf.org), 
development of computer hardware and software capable of manipulating and analyzmg the 

data(biomformatics), and automationof procedures associated with generatingand analyzing 
DNA sequences (engineermg) are responsible for this accelerated tune fiame. 
Single-Molecule DNA yigq^Bp..^np 

Conventional DNA sequencmg strategies and methods are reliable, but time, labor, 
and cost intensive. To address these issues, some researchers are mvestigating fluorescence- 
based, single-molecule sequencmg methods that use enzymatic degradation, followed by 
single-dNMP detection and identificatioa The DNA polymer containing fluorescently- 
labeled nucleotides is digested by an exonuclease, and the labeled nucleotides are detected 
andidentifiedbyflow cytometry (Davise/ a/., 1991; Davis era/., 1992; Goodwin e/ a/., 1997; 
Keller etd.. 1996; Sauer et al. 1999; Werner et al, 1999). This method requires that the 
DNA strand is synthesized to contain the flourescently-labeled base(s). This requirement 
limits the length of sequence that can be determined, and increases the number of 
manipulations that must be performed before any sequence data is obtained. A related 
approach proposes to sequentially separate smgle (unlabeled) nucleotides from a strand of 
DNA, confine them in their original order in a solid matrix, and detect the spectroscopic 
emission of the separated nucleotides to reconstruct DNA sequence inft>rmation (Uhner, 
1997;MitsisandKwagh, 1999;Dapprich, 1999). This is the approach that is bemg developed 
by Praelux, Lie, a company with a goal to develop single-molecule DNA sequencmg. 
TheoreticaUy, this latter method should not be as susceptible to length limitations as the 
former enzymatic degradation method, butitdoes require numerous manipulations before any 
sequence information can be obtained. 

Li-cor, Inc. is developmg an enzyme synthesis based strategy for single-molecule 
sequencing as set forth in PCT application WO 00/36151. The Li-cor method involves 
multiply modifying each dNTP by attaching a fluorescent tag to the y-phosphate and a 
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quenching moiety to the another site on the dNTP, preferably on the base. The quenching 
moiety is added to prevent emission from the fluorescent tag attached to an unincorporated 
dNTP. Upon incorporation the fluorescent tag and quenching moiety are separated, resulting 
in emission from the tag. The tag (contained on the pyrophosphate) flows away from the 
polymerase active site, but the modified (quenched) base becomes part of the DNA polymer. 

Although some single-molecular sequencing systems have been disclosed, many of 
them anticipate or require base modification. See, e.g.. Patent Application Serial Numbers 
WO 01/16375 A2, WO 01/23610 A2, WO 01/25480, WO 00/06770, WO 99/05315, WO 
00/60114, WO 00/36151, WO 00/36512, and WO 00/70073, incorporated herein by 
reference. Base modifications may distort DNA structure (which normally consists of A- 
form DNA nearest the enzyme active site; Li et aL, 1998a). Since fhe dNTP and 
approximately 7 of the 3 -nearest bases in tfie newly synthesized strand contact internal 
regions of the polymerase (Li et al., 1998a), the A-fonn DNA may be important for 
maximizing minor groove contacts between the enzyine and the DNA. If the DNA structure 
is affected due to base modification, enzyme fidelity and/or function may be altered. Thus, 
there is still a need in the art for a fast and efiScient enzymatic DNA sequencing system for 
single molecular DNA sequences. 

SUMMARY OF THE INVENTION 
SINGLE-MOLECULE SEQUENCING 

The present invention provides a polymerizing agent modified with at least one 
molecular or atomic tag located at or near, associated wdth or co valentiy bonded to a site on 
the polymerizing agent, where a detectable property of the tag undergoes a change before, 
during and/or after monomer incorporation. The monomers can be organic, inorganic or bio- 
organic monomers such as nucleotides for DNA, RNA, mixed DNA/RNA sequences, amino 
acids, monosaccharides, synthetic analogs of naturally occurring nucleotides, synthetic 
analogs of naturally occiining amino acids or synthetic analogs of naturally occurring 
monosaccharides, synthetic organic or inorganic monomers, or the like. 

The present invention provides a depolymerizing agent modified with at least one 
molecular or atomic tag located at or near, associated with or covalentiy bonded to a site on 
the depolymerizing agent, where a detectable property of the tag undergoes a change before, 
during and/or after monomer removal. The polymers can be DNA, RNA, mixed DNA/RNA 
sequences containing only naturally occurring nucleotides or a mixture of naturally occurring 
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nucleotides and synthetic analogs thereof, polypeptide sequences containing only naturally 
occurring amino acids or a mixture of naturally occurring amino acids and synthetic analogs 
thereof, polysaccharide or carbohydrate sequences containing only naturally occurring 
monosaccharides or a mixture of naturally occurring monosaccharides and synthetic analogs 
thereof, or polymers containing synthetic organic or inorganic monomers, or the like. 

The present invention also provides a system that enables detecting a signal 
corresponding to a detectable property evidencing changes in interactions between a 
synthesizing/polymerizing agent or a depolymerizing agent (molecule) and its substrates 
(monomers or depolymerizable polymers) and decoding the signal into monomer order 
specific information or monomer sequence information, preferably in real-time or near real- 
time. 

SINGLE SITE TAGGED POLYMERASE 

The presrait invention provides a polymerase modified with at least one molecular or 
atomic tag located at or near, associated with, or covalently bonded to a site on the 
polymerase, where a detectable property of the tag undergoes a change before, during and/or 
after monomer incorporation. The monomers can be nucleotides for DNA, RNA or mixed 
DNA/KNA monomers or synthetic analogs polymerizable by the polymerase. 

The present invention provides an exonuclease modified with at least one molecular 
or atomic tag located at or near, associated with, or covalently bonded to a site on the 
exonuclease, where a detectable property of the tag undergoes a change before, during and/or 
after monomer release. The polymers can be DNA, RNA or mixed DNA/RNA sequences 
comprised of naturally occurring monomers or synthetic analogs depolymerizable by the 
exonuclease. 

The present invention provides a polymerase modified with at least one molecular or 
atomic tag located at or near, associated with, or covalently bonded to a site that undergoes 
a conformational change before, during and/or after monomer incorporation, where the tag 
has a first detection propensity when the polymerase is in a first conformational stale and a 
second detection propensity when the polymerase is in a second conformational state. 

The present inventionprovidesapolymerase modified with at least one chromophore 
located at or near, associated with, or covalently bonded to a site that undergoes a 

conformationalchange before, during and/or after monomer incorporation, whaeanintensity 
and/or fi»quency of emitted Ught of the chromophore has a first value when the polymerase 
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is in a first conformational state and a second value when the polymerase is in a second 
conformational state. 

present invention provides a polymerase modified v^th at least one fluorescently 
active molecular tag located at or near, associated with, or covalently bonded to a site that 
undergoes a conformational change before, during and/or after monomer incorporation, where 
the tag has a first fluorescence propensity when the polymerase is in a first conformational 
state and a second fluorescence propensity when the polymerase is in a second 
conformational state. 

The present invention provides a polymerase modified vntti a molecular tag located 
at or near, associated with, or covalently bonded to a site that xmdergoes a conformational 
change before, during and/or after monomer incorporation, ^ere the tag is substantially 
detectable when the polymerase is in a first conformational state and substantially non- 
detectable vsdien the polymerase is in a second conformational state or substantially non- 
detectable when the polymerase is in the first conformational state and substantially 
detectable "when the polymerase is in the second conformational state. 

The present invention provides a polymerase modified with at least one molecular or 
atomic tag located at or near, associated with, or covalently bonded to a site that interacts with 
a tag on the released pyrophosphate group, where the polymerase tag has a first detection 
propensity before interacting with the t^ on the released pyrophosphate group and a second 
detection propensity when interacting with the tag on the released pyrophosphate group. In 
a preferred embodiment, this change in detection propensity is cyclical occurring as each 
pyrophosphate group is released. 

The present invention provides a polymerase modified with at least one chromophore 
located at or near, associated vsdth, or covalently bonded to a site that interacts with a tag on 
the released pyrophosphate group, where an intensity and/or firequency of light emitted by the 
chromophore has a Gist value before the chromophore interacts with the tag on the released 
pyrophosphate and a second value when int^acting with the tag on the released 
pyrophosphate group. Lot a preferred embodiment, this change in detection propensity is 
cyclical occurring as each pyrophosphate group is released. 

The present invention provides a polymerase modified with at least one fluorescentiy 
active molecular tag located at or near, associated witii, or covalentiy bonded to a site that 
interacts with a tag on the released pyrophosphate group, where the polymerase tag changes 
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from a first state prior to release of the pyrophosphate group and a second state as the 
pyrophosphate group diffuses away from the site of release. In a preferred embodiment, this 
change in detection propensity is cyclical occurring as each pyrophosphate group is released. 

The present invention provides a polymerase modified with a molecular tag located 
at or near, associated with, or covalentiy bonded to a site that interacts with a tag on the 
released pyrophosphate group, where the polymerase tag changes from a substantiaUy 
detectable state prior to pyrophosphate release to a substantially non-detectable state when 
the polymerase tag interacts with the tag on the pyrophosphate group after group release, or 
changes fix>m a substantially non-detectable state prior to K«>phosphate release to a 
substantially detectable state when the polymerase tag intenicts with the tag on the 
pyrophosphate group after group release. 

MULTIPLE SITE TAGGED POLYMERIZING OR DEPOLYMERIZING AGENTS 
The present invention provides a monomer polymeriamg agent modified with at least 
one pair of molecular and/or atomic tags located at or near, associated with, or covalentiy 
bonded to sites on the polymerizing agent, where a detectable property of at least one tag of 
the pair undergoes a change before, during and/or after monomer incorporation or where a 
detectable property of at least one tag of the pair tmdagoes a change before, during and/or 
after monomra- incorporation due to a change in inter-tag interaction. 

The present inventionprovides a depolymerizing agent modified with at least one pah 
of molecular and/or atomic tags located at or near, associated with, or covalentiy bonded to 
sites on the depolymerizing agent, where a detectable property of at least one tag of the pair 
undergoes a change before, during and/or after monomer release or where a detectable 
property of at least one tag of the pair undergoes a change before, during and/or after 
monomer release due to a change in inter-tag interaction. 

The present invention provides a monomer polymerizing agent modified with at least 
one pair of molecular and/or atomic tags located at or near, associated with, or covalentiy 
bonded to sites on the polymerizing agent, where a detectable property of at least one tag of 
the pair has a first value when the polymerizing agent is in a first state and a second value 
v\*en the polymerizing agent is in a second state, where the polymerizing agent chan^ fiom 
the first state to the second state and back to the first state during a monomer incorporation 
cycle. 

The present invention provides a depolymerizing agent modified with at least one 
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pair of molecular and/or atomic tags located at or near, associated with or covalently bonded 
to sites on the polymerizing agent, where a detectable property of at least one tag of the pair 
has a first value when the depolymerizing agent is in a first state and a second value when the 
depolymerizing agent is in a second state, where the depolymerizing agent changes fi*om the 
first state to the second state and back to the first state during a monomer release cycle. 

Preferably, the first and second states are different so that a change in the detected 
signal occurs. However, a no-change result may evidence other properties of the 
polymerizing media or depolymerizing media. 
MULTIPLE SITE TAGGED POLYMERASE 

The present invention provides a polymerase modified with at least one pair of 
molecular tags located at or near, associated with, or covalently bonded to sites at least one 
of the tags undergoes^ a change during monomer incorporation, where a detectable propaly 
of the pair has a first value when the polymerase is in a first state and a second value when 
the polymerase is in a second state, where the polymerase changes from the first state to the 
second state and back to the first state during a monomer incorporation cycle. 

The present invention provides a polymerase modified with at least one pair of 
molecular tags located at or near, associated with or covalently bonded to sites at least one 
of the tags undergoes conformational change during monomer incorporation, where the 
detectably property of the pair has a first value when the polymerase is in a first 
conformational state and a second value when the polymerase is in a second conformational 
state, where the polymerase changes firom the first state to the second state and back to the 
first state diaring a monomer incorporation cycle. 

The present invention provides a polymerase modified v^th at least one pair of 
molecules or atoms located at or near, associated with or covalently bonded to sites at least 
one of the tags undergoes conformational change during monomer incorporation, where the 
pah: interact to form a chromophore when the polymerase is in a first conformational state or 
a second conformational state, where the polymerase changes fi:om the first state to the 
second state and back to the first state during a monomer incorporation cycle. 

The present invention provides a polymerase modified with at least one pair of 
molecular tags located at or near, associated with or covalently bonded to sites at least one 
of the tags undergoes conformational change during monomer incorporation, where the t^ 
have a first fluorescence propensity when the polymerase is ui a first confotmatioiml state and 
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a second fluorescence propensity when the polymerase is in a second conformational state, 
where the polymerase changes from the first state to the second state and back to the first 
state during a monomer incorporation cycle. 

The present invention provides a polymerase modified with at least one pair of 
molecular tags located at or near, associated with or covalently bonded to sites at least one 
of the tags undergoes conformational change during monomer incorporation, where the pm 
is substantially active when the polymerase is in a first conformational state and substantially 
inactive when the polymerase is in a second conformational state or substantially inactive 
when the polymerase is in the first conformational state and substantially active when the 
polymerase is in the second conforaiational state, where the polymerase changes fiom the 
first state to the second state and back to the first state during a monomer incorporation cycle. 

The present invention provides a polymerase modified with at least one pair of 
molecular tags located at or near, associated with, or covalently bonded to sites at least one 
of the tags undergoes a change during and/or after pyrophosphate release during the monomer 
incorporation process, where a detectable property of the pair has a first value when the tag 
is in a first state prior to pyrophosphate release and a second value when the tag is in a second 
state during and/or after pyrophosphate release, where the tag changes from its first state to 
its second state and back to its first state during a monomer incorporation cycle. 

The present invention provides a polymerase modified with at least one pair of 
molecular tags located at or near, associated with or covalently bonded to sites at least one 
of the tags undergoes a change in position due to a conformational change in the polymerase 
during the pyrophosphate release process, where the detectably property of the pair has a first 
value when the tag is in its first position and a second value when the tag is in its second 
position, where the tag changes from its first position to its second position and back to its 
first position during a release cycle. 

The present mvention provides a polymerase modified with at least one pair of 
molecules or atoms located at or near, associated with or covalently bonded to sites, ^ere 
the tags change relative separation due to a conformational change in the polymerase during 
pyrophosphate release, where the tags interact to form a chromophore havmg a first emission 
profile when the tags are a first distance apart and a second profile vAiea the tags are a second 
distance apart, where the separation distance changes fix)m its first state to its second state and 
back to its first state during a pyrophosphate release cycle. 
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The present invention provides a polymerase modified with at least one pair of 
molecular tags located at or near, associated with or covalently bonded to sites, where the tags 
change relative separation due to a conformational change in the polymerase during 
pyrophosphate release, where the tags have a first fluorescence propensity when the 
polymerase is in a first conformational state and a second fluorescence propensity when the 
polymerase is in a second conformational state, where the propensity changes firom its the 
first value to its second value and back again during a pyrophosphate release cycle. 

The present invention provides a polymerase modified with at least one pair of 
molecular tags located at or near, associated with or covalently bonded to sites, where the tags 
change relative separation due to a conformational change in the polymerase during 
pyrophosphate release, where the pair is substantially fluorescently active when the tags have 
a first separation and substantially fluorescently inactive when the tags have a second 
separation or substantially fluorescently inactive when the tags have the first separation and 
substantially fluorescently active when the tags have the second separation, where the 
fluorescence activity undergoes one cycle during a pyrophosphate release cycle. 

It should be recognized that when a property changes fi-om a first state to a second 
state and back again, then the property undergoes a cycle. Preferably, the first and second 
states are different so that a change in the detected signal occurs. However, a no-change 
result may evidence other properties of the polymerizing medium or depolymerizing mediimi. 
METHODS USING TAGGED POLYMERIZING AGENT 

The present invention provides a method for determining when a monomer is 
incorporated into a growing molecular chain comprising the steps of monitoring a detectable 
property of an atomic or molecular tag, where the tag is located at or near, associated with, 
or covalently bonded to a site on a polymerizing agent, where the detectable property of the 
tag undergoes a change before, during and/or after monomer incorporation. 

The present invention provides a method for determining when a monomer is 
incorporated into a growing molecular chain comprising the steps of monitoring a detectable 
property of an atomic or molecular tag, where the tag is located at or near, associated with, 
or covalently bonded to a site on a polymerizing agent, where the detectable property has a 
furst value when the agent is in a first state and a second value wdien the agent is 
state, where the agent changes firom the first state to the second state and back to the fjrst state 
during a monomer incorporation cycle. 
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Preferably, the first and second states are different so that a change in the detected 
signal occurs. However, a no-change result may evidence other properties of the 
polymerizing medium. 

METHODS USING TAGGED POLYMERASE 

The present invention provides a method for determining when or whether a monomer 
is incorporated into a growing molecular chain comprising the steps of monitoring a 
detectable property of a tag, where the tag is located at or near, associated \wth, or covalently 
bonded to a site on a polymwase, where the site undergoes a change during monomer 
incorporation and where the detectable property has a first value when the polymraase is in 
a first state and a second value when the polymerase is m a second state, where the values 
signify that the site has undergone the change and where the polymerase changes fix)m the 
first state to the second state and back to the first state during a monomer incorporation cycle. 

The present inventionprovidesamethod for determiningv^^en or wheflieramonomer 
is incorporated into a growing molecular chain con^sing the steps of monitoring a 
detectable property of a tag, where Ae tag is located at or near, associated with, or covalently 
bonded to a site on a polymerase, where the site undergoes a conformational change during 
monomer incoiporation and where the detectable property has a first value when the 
polymerase is in a first conformational state and a second value when the polymerase is m 
a second conformational state, where the values signify that the site has undergone the change 
and wdiere ttie polymerase changes fix)m the first state to the second state and back to the first 
state during a monomer incorporation cycle. 

The present invention provides amethod for determining when or whether amonomer 
is incorporated into a growing molecular chain comprising the steps of exposing a tagged 
polymerase to light, monitoring an intensity and/or fi-equency of fluorescent Ught emitted by 
the tagged polymerase, where the tagged polymerase comprises a polymerase mcluding a tag 
located at or near, associated with, or covalently bonded to a site that undergoes 
conformational change during monomer incorporation and where flie tag emits fluorescent 
Ught at a first intensify and/or firequency when the polymerase is in a first conformational 
state and a second intensify and/or fi«quency vibeD. the polymerase is in a second 
conformational state, v/hece the change in intensities and/or fixsquendes signifies that the site 
has undergone the change and where the polymerase changesfrom the first state to the second 
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state and back to the first state during a monomer incorporation cycle. 

The present invention also provides the above methods using a plurality of tagged 
polymerases permitting parallel and/or massively parallel sequencing simultaneously. Such 
parallelism can be used to ensure confidence. Such parallelism can also be used to quickly 
detect the degree of homology in DNA sequences for a given gene across species or to 
quickly screen patient DNA for specific genetic traits or to quickly screen DNA sequences 
for polymorphisms. 

The present invention also provides a method for determining if or wrhen a monomer 
is incorporated into a growing DNA chain associated with a polymerase, where a tag is 
located on the polymerase so that as the pyrophosphate group is released after base 
incorporation and prior to its difiiision away fi'om the polymerase, the polymerase tag 
interacts with the tag on the pyrophosphate causing a change in a detectable property of one 
of the tags or a detectable property associated with both tags in ihe case of a fluorescent pair. 

Preferably, the first and second states are different so that a change in the detected 
signal occurs. However, a no-change result may evidence other properties of the 
polymerizing media. 

APPARATUSES USING TAGGED POLYMERIZING AGENT 

The present invention provides a single-molecule sequencing apparatus comprising 
a substrate having deposited thereon at least one tagged polymerizing agent. The tagged 
polymerizing agent can be placed on the surface of the substrate in an appropriate 
polymerizing medium or the polymerizing agent can be confined in a region, area, well, 
groove, channel or other similar structure on the substrate. The substrate can also include a 
monomer region, area, well, groove, channel, reservoir or other shnilar structure on the 
substrate coimected to the polymerizing agent confinement structure by at least one 
coimecting structure capable of supporting molecular transport of monomer to the 
polymerizing agent such as a channel, groove, or the like. Alternatively, the substrate can 
include structures containing each monomer, where each structure is connected to the 
polymerizing agent confinement structure by a connecting structure capable of supporting 
molecular transport of monomer to the polymerizing agent. The substrate can also be 
subdivided into a plurality of polymerizing agent confinement structures, vfhere each 
structure is connected to a monomer reservoir. Alternatively, each polymerizing agent 
confinement structure can have its own monomer reservoir or sufficient monomer reservoirs 



wo 02/04680 



PCT/USOl/21811 



.16^ 

so that each reservoir contains a specific monomer. 

The present invention also provides a single-molecule sequencing apparatus 
comprising a substrate having at least one tagged polymerizing agent attached to the surface 
of the substrate by a molecular tether or Imking group, where one end of the tether or linking 
group is bonded to a site on the surface of the substrate and the other end is bonded to a site 
on the polymerizing agent or bonded to a site on a molecule strongly associated with the 
polymerizing agent. In this context, the temi "bonded to" means that chemical and/or 
physical interactions sufficient to maintain the polymerizing agent within a given region of 
the substrate under nonnal polymerizing conditions. The chemical and/or physical 
interactions mclude, without limitation, covalent bonding, ionic bonding, hydrogen bonding, 
apolar bonding, attractive electrostatic interactions, dipole interactions, or any other electrical 
or quantum mechanical mteraction sufficient in toto to maintain the polymerizing agent in a 
desired region of the substrate. The substrate having tethered tagged polymerizing agent 
attached thereon can be placed m container contaming an appropriate polymerizing medium. 
Alternatively, the tagged polymerizing agent can be tethered or anchored on or within a 
region, area, well, groove, channel or other similar structure on the substrate capable of being 
filled with an appropriate polymerizmg medium. The substrate can also include a monomer 
region, area, well, groove, channel or other similar structure on the substrate connected to the 
polymerizing agent structure by at least one a connecting structure capable of supporting 
molecular transports of monomer to the polymerizing agent. Alternatively, the substrate can 
include structures containing each monomer, where each structure is connected to the 
polymerizing agent structure by a connecting structure capable of supportmg molecular 
transports of monomer to the polymerizing agent. The substrate can also be subdivided into 
a plurality of polymerizmg agent structures each having at least one tethered polymerizmg 
agent, where each structure is connected to a monomer reservoir. Alternatively, each 
polymerizmg agent structure can have its own monomer reservoir or sufficient monomer 
reservoirs, one reservoir of each specific monomer. 

The monomers for use in these apparatus including, without limitation, dNTPs, tagged 
dNTPs, ddNlTs, tagged dd>m>s, andno acids, tagged aniino 

monosaccharides or appropriate mixtures or combinations thereof depending on the type of 
polymer being sequenced. 

APPARATUS USING TAGGED POLYMERASE 
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The present invention provides a single-molecule sequencing apparatus comprising 
a substrate having deposited thereon at least one tagged polymerase. The tagged polymerase 
can be placed on the surface of the substrate in an appropriate polymerizing medium or the 
polymerase can be confined in a region, area, well, groove, channel or other similar structure 
on the substrate capable of being filled vsdth an appropriate polymerizing medium. The 
substrate can also include a monomer region, area, well, groove, channel or other similar 
structure on the substrate comiected to the polymerase confinement structure by at least one 
connecting structure capable of supporting molecular transports of monomer to the 
polymerase. Alternatively, the substrate can include structures containing each monomer, 
where each structure is connected to the polymerase confinement structure by a connecting 
structure capable of supporting molecular transports of the monomer to the polymerase in the 
polymerase confinement structures. The substrate can also be subdivided into a plurality of 
polymerase confinement structures, where each structure is connected to a monomer 
reservoir. Alternatively, each polymerase confinement structure can have its own monomer 
reservoir or four reservoirs, each reservoir containing a specific monomer. 

The present invention also provides a single-molecule sequencing apparatus 
comprising a substrate having at least one tagged polymerase attached to the surface of the 
substrate by a molecular tether or linking group, where one end of the tether or linking group 
is bonded to a site on the surface of the substrate and the other end is bonded (either directly 
or indirectly) to a site on the polymerase or bonded to a site on a molecule strongly associated 
with the polymerase. In this context, the term "bonded to" means that chemical and/or 
physical interactions sufficient to maintain the polymerase within a given region of the 
substrate under normal polymerizing conditions. The chemical and/or physical interactions 
include, without limitation, covalent bonding, ionic bonding, hydrogen bonding, apolar 
bonding, attractive electrostatic interactions, dipole interactions, or any other electrical or 
quantum mechanical interaction sufficient in toto to TnaiTit«^n the polymerase in its desired 
region. The substrate having tethered tagged polymerizing agent attached thereon can be 
placed in container containing an appropriate polymerizing medium. Alternatively, the 
tagged polymerizing agent can be tethered or anchored on or within a region, ai^ well, 
groove, channel or other similar structure on the substrate capable of being filled with an 
appropriate polymerizing medium. The substrate can also uxclude a monomer region, area, 
well, groove, channel or other similar structure on the substrate connected to the polymerase 
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Structure by at least one channel. Alternatively, the substrate can include structures 
containing each monomer, where each structure is connected to the polymerase stmcture by 
a connecting structure that supports molecular transports of the monomer to the polymerase 
in the polymerase confinement structures. The substrate can also be subdivided into a 
plurality of polymerase structures each having at least one tethered polymerase, where each 
structure is connected to a monomer reservoir. Alternatively, each polymerase structure can 
have its own monomer reservoir or four reservoirs, each reservoir containing a specific 
monomer. 

The monomers for use in these apparatus including, without limitation, dNTPs, tagged 
dNTPs, ddNTPs, tagged ddNTPs, or mixtures or combinations thereof. 
METHODS USING THE SINGLE-MOLECULE SEQUENCING APPARATUSES 

The present invention provides a method for single-molecule sequencing comprising 
the step of supplying a plurality of monomers to a tagged polymerizing agent confined on or 
tethered to a substrate and monitoring a detectable property of the tag over time. The method 
can also include a step of relating changes in the detectable property to the occurrence 
(timing) of monomer addition and/or to the identity of each incorporated monomer and/or to 
the near simultaneous determination of the sequence of incorporated monomers. 

The present invention provides a method for single-molecule sequencing comprising 
the step of supplying a plurality of monomers to a tagged polymerizing agent confined on or 
tethered to a substrate, exposing the tagged polymerizing agent to light either continuously 
or periodically and measuring an intensity and/or fiequency of fluorescent light emitted by 
the tag over time. The method can fiirther comprise relating the changes in the measured 
intensity and/or fi-equency of emitted fluorescent Ught from the tag over time to the 
occurrence (timing) of monomer addition and/or to the identity of each incorporated 
monomer and/or to the near simultaneous determination of the sequence of die incorporated 
monomers. 

The present invention provides a method for single-molecule sequencing comprising 
the step of siqiplying apluraUty of monomers to a tagged polymerase confined on or tethered 
to a substrate and monitoring a detectable property of the tag over time. The method can also 
include a step of relating changes in the detectable property over time to the occunence 
(tuning) of monomer addition and/or to the identity of each incorporated monomer and/or to 
the near sunultaneous determination of flie sequence of tiie incorporated monomers. 
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The present invention provides a method for single-molecule sequencing comprising 
the step of supplying a plurality of monomers to a tagged polymerase confined on a substrate, 
exposing the tagged polymerase to light continuously or periodically and measuring an 
intensity and/or frequency of fluorescent light emitted by the tagged polymerase over time. 
The method can further comprise relating changes in the measured intensity and/or frequency 
of emitted fluorescent light from the tag over time to the occurrence (timing) of monomer 
addition and/or to the identity of each incorporated monomer and/or to the near simultaneous 
determination of the sequence of the incorporated monomers. 
COOPERATIVELY TAGGED SYSTEMS 

The present invention provides cooperatively tagged polymerizing agents and tagged 
monomers, where a detectable property of at least one of the tags changes when the tags 
interact before, during aiul/or after monomer insertion. In one preferred embodiment, the tag 
on the polymerase is positioned such that tiie tags interact before, during and/or after each 
monomer insertion. In tiie of case tags that are released from the monomers after monomer 
insert such as of P and/or y phosphate tagged dNTPs, /. e. , the tags reside on the P and/or y 
phosphate groups, the tag on the polymerizing agent can be designed to interact with tiie tag 
on the monomer only after the tag is released from the polymerizing agent after monomer 
insertion. Tag placement within a polymerizing agent can be optimized to enhance 
interaction between the polymerase and dNTP tags by attaching the polymerase tag to sites 
on the polymerase that move during an incorporation event changing the relative separation 
of the two tags or optimized to enhance interaction between the polymerase tag and the tag 
on the pyrophosphate as it is release during base incorporation and prior to its difiusion away 
from the polymerizing agent. 

The present invention provides cooperatively tagged polymerizing agents and tagged 
monomers, where a detectable property of at least one of the tags changes when the tags are 
within a distance sufficient to cause a measurable change in the detectable property. If the 
detectable property is fluorescence induced in one tag by energy transfer to the other tag or 
due to one tag quenching the fluorescence of the other tag or causing a measurable change 
in the fluorescence intensity and/or frequency, the measurable change is caused by bringing 
the tags into close proximity to each other, i.^., decrease the distance separating the tags. 
Generally, the distance needed to cause a measurable change in the detectable property is 
within (less than or equal to) about lOoA, preferably within about 5oA, particularly within 
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about 25A, especially within about 15 A and most preferably within about lOA. Of course, 
one skilled in the art will recognize that a distance sufficient to cause a measurable change 
in a detectable property of a tag will depend on many parameters including the location of the 
tag, the nature of the tag, the solvent system, external fields, excitation source intensity and 
frequency band width, temperature, pressure, etc. 

The present invention provides a tagged polymerizing agent and tagged monomer 
precursor(s), where an intensity and/or frequency of fluorescence light emitted by at least one 
tag changes when the tags interact before, during and/or after monomer insertion. 

The present invention provides cooperatively tagged depolymerizing agents and 
tagged depolymerizable polymer, where a detectable property of at least one of the tags 
changes when the tags interact before, during and/or after monomer release. The tag on the 
depolymerizing agent can be designed so that the tags mteract before, during and/or after each 
monomer release. 

The present invention provides cooperatively tagged depolymerizing ^ents and 
tagged polymers, where a detectable property of at least one of the tags changes when the tags 
are within a distance sufficient to cause a change in measurable change in the detectable 
property. If the detectable property is fluorescence induced in one tag by energy transfer to 
file other tag or due to one tag quenching the fluorescence of the other tag or causing a 
measurable change in the fluorescence intensity and/or frequency, the measurable change is 
caused by bringing to tags into close proximity to each other, i.e., decrease the distance 
separating the tags. Generally, the distance needed to cause a measurable change in the 
detectable property is within (less than or equal to) about 1 00 A, preferably within about 50A, 
particularly within about 25 A, especially within about 1 5 A and most preferably within about 
10 A. Of course, one skilled m the art will recognize that a distance sufficient to cause a 
measurable change in a detectable property of a tag will depend on many parameters 
including the location of tiie tag, tiie nature of the tag, the solvent system, external fields, 
excitation source intensity and frequency band width, temperature, pressure, etc. 

The present invention provides a tagged depolymerizmg agents and a tagged polymer, 
where an intensity and/or frequency of fluorescence light emitted by at least one tag changes 
when the tags interact before, during and/or after monomer release, 
COOPERATIVELY TAGGED SYSTEMS USING A POLYMERASE 

The present invention provides cooperatively tagged polymerase and tagged 
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monomers, where a detectable property of at least one of the tags changes when the tags 
interact before, during and/or after monomer insertion. The tag on the polymerase can be 
designed so that the tags interact before, during and/or after each monomer insertion. In the 
of case tags that are released from the monomers after monomer insert such as of p and/or 
Y phosphate tagged dNTPs, ie.^ the tags reside on the p and/or y phosphate groups, the tag 
on the polymerizing agent can be designed to interact with the tag on the monomer only after 
the tag is released from the polymerizing agent after monomer insertion. In &e first case, the 
polymerase tag must be located on a site of the polymerase which allows the polymerase tag 
to interact with the monomer tag during the monomer insertion process - initial binding and 
bonding into the growing polymer. While in the second case, the polymerase tag must be 
located on a site of the polymerase which allows the polymerase tag to interact with the 
monomer tag now on the released pyrophosphate prior to its diffusion away from the 
polymerase and into the polymerizing nledium. 

The present invention provides cooperatively tagged polymerase and tagged 
monomers, where a detectable property of at least one of the tags changes when the tags are 
within a distance sufScient or in close proximity to cause a measurable change in the 
detectable property. If the detectable property is fluorescence induced in one tag by energy 
transfer to the other tag or due to one tag quenching the fluorescence of the other tag or 
causing a measurable change in the fluorescence intensity and/or frequency, the measurable 
change is caused by bringing to tags into close proximity to each other, /.e., decrease the 
distance separating the tags. Generally, the distance or close proximity is a distance between 
about lOOA and about lOA. Alternatively, the distance is less than or equal to about lOOA, 
preferably less than or eqxial to about 50A, paiticxilarly less than or equal to about 25A, 
especially less than or equal to about 15 A and most preferably less than or equal to about 
loA. Of course, one skilled in the art will recognize that a distance sufficient to cause a 
measurable change in a detectable property of a tag will depend on many parameters 
including the location of the tags, the nature of the tags, the solvent Systran (polymerizing 
medium), external fields, excitation source intensity and frequency band width, tempeiature, 
pressure, etc. 

The present invention provides a tagged polymerase and tagged monomer precursors, 
where the tags form a fluorescently active pair such as a donor-acceptor pair and an intensity 
and/or fi:equency of fluorescence light emitted by at least one tag (generally the acceptor tag 
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in donor-acceptor pairs) changes when the tags interact. 

The present invention provides a tagged polymerase and a tagged monomer 
precursors, where the tags form a fluorescently active pair such as a donor-acceptor pair and 
an intensity and/or frequency of fluorescence light emitted by at least one tag (generally the 
acceptor tag in donor-acceptor pairs) changes when the tags are a distance sufficient or in 
close proximity to change either the intensity and/or frequency of the fluorescent light 
Generally, the distance or close proximity is a distance between about lOoA and about loA. 
Alternatively, the distance is less than or equal to about lOOA, preferably less than or equal 
to about 50A, particularly less than or equal to about 25 A, especially less than or equal to 
about IsA and most preferably less than or equal to about loA. Ofcourse, one skilled in the 
art will recognize that a distance sufficient to cause a measurable change in a detectable 
property of a tag will depend on many parameters mcluding the location of the tag, the nature 
of the tag, the solvent system, external fields, excitation source intensity and frequency band 
width, temperature, pressure, etc. 

The present invention provides a single-molecule sequencing apparatus comprising 
a container having at least one tagged polymerase confined on or tethered to an interior 
surface thereof and having a solution containing a plurality of tagged monomers in contact 
with flie interior surface. 

MOLECULAR DATA STREAM READING METHODS AND APPARATUS 

The present invention provides a method for single-molecule sequencmg comprismg 
the step of supplying a plurality of tagged monomers to a tagged polymerase confined on an 
interior surface of a container, exposing the tagged polymerase to light and measuring an 
intensity and/or frequency of fluorescent light emitted by the tagged polymerase during each 
successive monomer addition or insertion into a growing polymer chain. The method can 
further comprise relating the measured intensity and/or frequency of emitted fluorescent light 
to incorporation events and/or to the identification of each inserted or added monomer 
resulting in a near real-time or real-time readout of the sequence of the a growing nucleic add 
sequence - DNA sequence, RNA sequence or mixed DNA/RNA sequences. 

The present invention also provides a system for retrieving stored information 
comprising a molecule having a sequence of known elements representing a data stream, a 
smgle-molecule sequencer comprising a polymerase having at least one tag associated 
therewith, an excitation source adapted to excite at least one tag on the polymerase, and a 
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detector adapted to detect a response from the excited tag on the polymerase, where the 
response from the at least one tag changes during polymerization of a complementary 
sequence of elements and the change in response represents a content of the data stream. 

The present invention also provides a system for determining seqxience information 
from a single-molecule comprising a molecule having a sequence of known elements, a 
single-molecule sequencer comprising a polymerase having at least one tag associated 
therewith, a excitation source adapted to excite at least one tag on the polymerase, and a 
detector adapted to detect a response from the excited tag on the polymerase, where the 
response from at least one tag changes during polymoization of a complementary sequence 
of elements representing the element sequence of the molecule. 

The present invention also provides a system for determining sequence information 
from a single-molecule comprising a molecule having a sequence of known elements, a 
single-molecule sequencer comprising a polymerase having at least one fluorescent tag 
associated therewith, an excitation light source adapted to excite at least one fluorescent tag 
on the polymerase and/or monomer and a fluorescent light detector adapted to detect at least 
an intensity of emitted fluorescent light from at least one fluorescent tag on the polymerase 
and/or monomer, where the signal intensity changes each time a new nucleotide or nucleotide 
analog is polymerized into a complementary sequence and either the dxiration of the emission 
or lack of emission or the wavelength range of the emitted light evidences the particular 
nucleotide or nucleotide analog polymerized into the sequence so that at the completion of 
the sequencing the data stream is retrieved. 

The present invention also provides a system for storing and retrieving data 
comprising a sequence of nucleotides or nucleotide analogs representing a given data stream; 
a single-molecule sequencer comprising a polymerase having at least one fluorescent tag 
covalently attached thereto; an excitation light source adapted to excite the at least one 
fluorescent tag on the polymerase and/or monomer; and a fluorescent light detector ad^ted 
to detect emitted fluorescent light from at least one fluorescent tag on the polymerase and/or 
monomer, where at least one fluorescent tag emits or fails to emit fluorescrat light each time 
a new nucleotide or nucleotide analog is polymerized into a complementary sequence and 
either tfie duration of &e emission or lack of emission or the wavel^igth range of the emitted 
light evidences the particular nucleotide or nucleotide analog polymerized into the sequence 
so that at the completion of the sequencing the data stream is retrieved. 
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The tenn monomer as used herein means any compound that can be incorporated into 
a growing molecular chain by a given polymerase. Such monomers include, without 
limitations, naturally occurring nucleotides (e.g., ATP, GTP, TTP, UTP, CIP. dATP, dOTP, 
dTTP, dUTP, dCTP, synthetic analogs), precursors for each nucleotide, non-naturaUy 

occurringnucleotides and theirprecursorsorany other mole<aUe that can be incorporated 
a growing polymer chain by a given polymerase. Additionally, amino adds (natural or 
synthetic) for protein or protein analog synthesis, mono sacchairides for carbohydrate 
synthesis or other monomeric syntheses. 

The term polymerase as used herein means any molecule or molecular assembly that 
can polymerize a set of monomers m1» apolymer having a predetermined sequence of the 
monomers, including, without limitation, naturally occurring polymerases or reverse 
transcriptases, mutated naturally occurring polymerases or reverse transcriptases, where the 
mutation involves the replacement of one or more or many amino acids with other amino 
acids, the insertion or deletion of one or more or many amino acids from the polymerases or 
reverse transcriptases, or the conjugation of parts of one or more polymerases or reverse 
transcriptases, non-naturally occurring polymerases or reverse transcriptases. The terai 
polymerase also embraces synthetic molecules or molecular assembly that can polymerize 
a polymer having a pre-determined sequence of monomers, or any other molecule or 
molecular assembly that may have additional sequences that faciUtate purification and/or 
immobilization and/or molecular interaction of the tags, and that can polymerize a polymer 
having a pre-determined or specified or templated sequence of monomers. 
Single She Tagged Polymerizing nr nft polvmerizinp Ag«iif« 

The present invention provides a composition comprising a polymerizing agent 
including at least one molecular and/or atomic tag located at or near, associated with or 
covalently bonded to a site on the agent, where a detectable property of the tag undergoes a 
change before, during and/or after monomer incorporation. 

The present invention provides a composition comprising a polymerizmg agent 
including at least one molecular and/or atomic tag located at or near, associated with or 
covalently bonded to a site on the agent, where a detectable property has a first value ^ 

the polymerase is inafirst state andasecond value when the polymerase is inasecond state 
during monomer incorporation. 

Tbs present invention provides a composition con^jrising a depolymerizing agent 
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including at least one molecular and/or atomic tag located at or near, associated with or 
covalently bonded to a site on the agent, where a detectable property of the tag undergoes a 
change before, during and/or after monomer removal. 

The present invention provides a composition comprising a polymerizing agent 
including at least one molecular and/or atomic tag located at or near, associated with or 
covalently bonded to a site on the agent, where a detectable property has a first value when 
the polymerase is in a first state and a second value when the polymerase is in a second state 
during monomer removal. 
Single Site Tagged Polymerase 

The present invention provides a composition comprising a polymerase including at 
least one molecular and/or atomic tag located at or near, associated with or covalently bonded 
to a site on the polymerase, where a detectable property of the tag imdergoes a change before, 
during and/or after monomer incorporation. 

The present invention provides a composition comprising a polymerase including at 
least one molecular and/or atomic tag located at or near, associated with or covalently bonded 
to a site on the polymerase, where a detectable property has a first value ^en the 
polymerase is m a first state and a second value when the polymerase is in a second state 
during monomer incorporation. 

The present invention provides a composition comprising an exonuclease including 
at least one molecular and/or atomic tag located at or near, associated with or covalently 
bonded to a site on the ^ent, where a detectable property of the tag undergoes a change 
before, during and/or after monomer removal. 

The present invention provides a A composition comprising an exonuclease including 
at least one molecular and/or atomic tag located at or near, associated with or covalently 
bonded to a site on the agent, where a detectable property has a first value when the 
polymerase is in a first state and a second value when the polymerase is in a second state 
during monom^ removal. 

The present invention provides a composition comprising an enzyme modified to 
produce a detectable response prior to, during and/or after interaction with an appropriately 
modified monomer, where the monomers are nucleotides, nucleotide analogs, amino acids, 
amino acid analogs, monosaccarides, monosaccaride analogs or naJxtures or combinations 
thereof. 
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The present invention provides a composition comprising a polymerase including at 
least one molecular tag located at or near, associated with or covalently bonded to a site that 
undergoes conformational change during monomer incorporation, where the tag has a first 
detection propensity v/hcn the polymerase is in a first conformational state and a second 
detection propensity when the polymerase is in a second conformational state. 

The present invention provides a composition comprising a polymerase incl\iding at 
least one chromophore located at or near, associated with or covalently bonded to a site that 
undergoes conformational change during monomer incorporation, where an intensity and/or 
firequency of emitted light of the tag has a first value when the polymerase is in a firet 
conformational state and a second value when the polymerase is in a second conformational 
state. 

The present invention provides a composition comprising a polymerase including at 
least one molecular tag located at or near, associated with or covalently bonded to a site that 
undergoes conformational change during monomer incorporation, where the tag has a first 
fluorescence propensity when the polymerase is in a first conformational state and a second 
fluorescence propensity when the polymerase is in a second conformational state. 

The present invention provides a composition comprising a polymerase including a 
molecular tag located at or near, associated with or covalently bonded to a site that undergoes 
conformational change during monomer incorporation, where the tag is substantially active 
when the polymerase is in a first conformational state and substantially inactive when the 
polymerase is m a second conformational state or substantially inactive when the polymerase 
is in the first conformational state and substantially active when the polymerase is in the 
second conformational state. 

Multiple Site Tagged Polymerizing and Depolvmeriwinr A^ mt^ 

The present invention provides a composition comprising a polymerizing agent 
including at least one pair of molecular tags located at or near, associated with or covalently 
bonded to a site of the agent, where a detectable property of at least one of Ac tags undergoes 
a change before, during and/or after monomer incorporation. 

The present invention provides a composition comprismg a polymerizing agent 
including at least one pair of molecular tags located at or near, associated with or covalendy 
bonded to a site of the agent, where a detectable property has a fia^t value when the 
polymerase is in a first state and a second value when the polymerase is m a second state 
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during monomer incorporation. 

The present invention provides a composition comprising a depolymerizing agent 
including at least one pair of molecular tags located at or near, associated with or covalently 
bonded to a site of the agent, where a detectable property of at least one of the tags imdergoes 
a change before, during and/or after monomer removal. 

The present invention provides a composition comprising a depolymerizing agent 
including at least one pair of molecular tags located at or near, associated with or covalently 
bonded to a site of the agent, where a detectable property has a first value when the 
polymerase is in a first state and a second value when the polymerase is in a second state 
during monomer removal. 
Multiple Site Tagged Polymerase 

The present invention provides a composition comprising a polymerase including at 
least one pair of molecular tags located at or near, associated with or covalently bonded to a 
site of the polymerase, -wbsre a detectable property of at least one of the tags undergoes a 
change before, during and/or after monomer incorporation. 

The present invention provides a composition conq)rising a polymerase including at 
least one pair of molecular tags located at or near, associated with or covalently bonded to a 
site of the polymerase, where a detectable property has a first value when the polymerase is 
in a first state and a second value when the polymerase is in a second state during monomer 
incorporation. 

The present invention provides a composition comprising an exonuclease including 
at least one pair of molecular tags located at or near, associated with or covalently bonded to 
a site of the polymerase, where a detectable property of at least one of the tags undergoes a 
change before, during and/or after monomer removal. 

The present invention provides a composition comprising an exonuclease including 
at least one pair of molecular tags located at or near, associated with or covalently bonded to 
a site of the polymerase, where a detectable property has a first valye when the polymerase 
is in a first state and a second value when the polymerase is in a second state during monomer 
removal. 

The present invention provides a composition comprising a polymerase including at 
least one pair of molecular tags located at or near, associated with or covalently bonded to a 
site that und^goes conformational change during monomer incorporation, where the 
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detectable property of the pair has a first value when the polymerase is in a firet 
conformational state and a second value when the polymerase is in a second confoimational 
state. 

The present invention provides a composition comprising a polymerase including at 
least one pair of molecules or atoms located at or near, associated with or covalently bonded 

toasite that undergoes conformational change during monomer incorporation, where thepair 
interact to form a chromophore when the polymerase is in a first conformational state or a 
second conformational state. 

The present invention provides a con^sition comprising a polymerase including at 
least one pair of molecular tags located at or near, associated with or covalently bonded to a 
site that undergoes conformational change during monomer incorporation, where the tags 
have a first fluorescence propensity when the polymerase is in a first conformational state and 
a second fluorescence propensity when the polymerase is in a second conformational state. 

The present invention provides a composition comprising a polymerase including at 
least one pair of molecular tags located at or near, associated with or covalenfly bonded to a 
site that undergoes conformational change during monomer incorporation, where the pair is 
substantially active when the polymerase is in a first conformational state and substantially 
inactive when the polymerase is in a second conformational state or substantially inactive 
when the polymerase is in the first conformational state and substantially active when the 
polymerase is in the second conformational state. 
Methods Using Tagged Polymerase 

The present invention provides a metiiod for determining when a monomer is 
incorporated into a growing molecular chain comprising tiie steps of monitoring a detectable 
property of a tag, where the tag is located at or near, associated with or covalentiy bonded to 
a site on a polymerase or associated witii or covalently bonded to a site on tiie monomer, 
where tiie site undergoes a change during monomer incorporation and where the detectable 
property has a first value when the polymerase is in a first state and a second value vihea the 
polymerase is in a second state and cycles fiom tiie first value to the second value during each 
monomer addition. 

The present invention provides a mefliod for determining when a monomer is 
incorporated into a growing molecular chain comprising tiie steps of monitoring a detectable 
property of a tag, where tiie tag is located at or near, associated witii or covalentiy bonded to 
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a site on a polymerase or associated with or covalently bonded to a site on the monomer, 
where the site undergoes a conformational change during monomer incorporation and where 
the detectable property has a first value when the polymerase is in a first conformational state 
and a second value when the polymerase is in a second conformational state and cycles fi:om 
the first value to the second value during each monomer addition. 

The present invention provides a method for determining when a monomer is 
incorporated into a growing molecular chain comprising the steps of exposing a tagged 
polymerase to light, monitoring an intensity and/or firequency of fluor^cent light emitted by 
the tagged polymerase and/or monomer, where the tagged polymerase comprises a 
polymerase includuig a tag located at or near, associated with or covalently bonded to a site 
that undergoes conformational change during monomer incorporation or associated with or 
covalently bonded to a site on the monomer and where the tag emits fluorescent light at a first 
intensity and/or firequency when the polymerase is m a first conformational state and a second 
intensity and/or firequency when the polymerase is in a second conformational state and 
cycles firom the first value to ttie second value during each monomer addition. . 
Sin gle-molecule Sequencing Apparatus Using Ta gged Polymerase 

The present invention provides a composition comprising a single-molecule 
sequencing apparatus comprising a substrate having a chamber or chip surface in which at 
least one tagged polymerase is confined therein and a plurality of chambers, each of which 
includes a specific monomer and a plurality of channels interconnecting the chambers, where 
each replication complex is sufGcientiy distant to enable data collection firom each complex 
individually. 

The present invention provides a method for single-molecule sequencing comprising 
the steps of supplying a plurality of monomers to a tagged polymerase confined on a 
substrate, exposing the tagged polymerase to light and measuring an intensity and/or 
fi^quency of fluorescent light emitted by the tagged polymerase. The method can finrther 
comprise the step of relating the measured mtensity and/or firequency of emitted fluorescent 
light to incorporation of a specific monomer into a growing DNA chain. 
Cooperatively Tagged Monomers and Tagged Polvmering Agent 

The present invention provides a composition comprising a cooperatively tagged 
polymerizing agent and tagged monomers, where a detectable property of at least one of the 
tags changes ^en the tags interact 
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The present invention provides a composition comprising a cooperatively tagged 
depoiymerizing agent and tagged depolymerizable monomers, where a detectable property 
of at least one of the tags changes vfhea the tags interact. 
CoQP<?rff<PY?lY T^ggeg Monomers and Tayy««i PohrmemlM. 

The present invention provides a composition comprising a cooperatively tagged 
polymerase and tagged monomers, where a detectable property of at least one of the tags 
changes when the tags interact 

The present invention provides a composition comprising a cooperatively tagged 
polymerase and tagged monomers, where a detectable property of at least one of the tags 
changes v*en the tag are within within a distance sufficient to cause a change in the intensity 
and/or frequency of emitted fluorescent light. 

The present invention provides a composition comprising a tagged polymerase and 
tagged monomer precursors, where an intensity and/or fiequency of fluorescence light 
emitted by at least one tag changes when the tags intraact 

The present invention provides a composition comprising a tagged polymerase and 
a tagged monomer precursors, where an intensity and/or frequency of fluorescence light 
emitted by at least one tag changes when the tags are within a distance sufficient to cause a 
change in the intensity and/or frequency of emitted fluorescent light. 

The present invention provides a single-molecule sequencing apparatus comprising 
a container having at least one tagged polymerase confined on an interior surface thereof and 
having a solution containing a plurality of tagged monomers in contact with the interior 
surface or a subset of tagged monomers and a subset of untagged monomers which together 
provide all monomers precursor for polymerization. 

The present invention provides a method for single-molecule sequencing comprismg 
the steps of supplying a plurality of lagged monomers to a tagged polymerase confined on an 
interior surface of a container, exposing the tagged polymerase to light and measuring an 

intensity and/orfrequency of fluorescent lightemittedby the taggedpolymerase. The melliod 
can further comprise relating the measured intensity and/or frequency of emitted fluorescent 
light to incorporation of a specific monomer into a growing DNA chain. 

The present inventionprovidesasystem for retrieving stored information comprising: 
(a) a molecule having a sequence of elements representing a data stream; (b) a single- 
molecule sequencer coniprising a polymerase having at least one tag associated therewith; 
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(c) an excitation source adapted to excite the at least one tag on the polymerase; and (d) a 
detector adapted to detect a response firom the tag on the polymerase or on the monomers; 
where the response from at least one tag changes during polymerization of a complementary 
sequence of elements and the change in response represents a data stream content. 

The present invention provides a system for determining sequence information from 
a single-molecule comprising: (a) a molecule having a sequence of elements; (b) a single- 
molecule sequencer comprising a polymerase having at least one tag associated therewith; 
(c) an excitation source adapted to excite at least one tag on the polymerase or on the 
monomers; and (d) a detector adapted to detect a response from the tag on the polymerase; 
where the response from at least one tag changes during polymerization of a complementary 
sequence of elements representing the element sequence of the molecule. 

The present invention provides a system for determining sequence information from 
an individual molecule comprismg: (a) a molecule having a sequence of elements; (b) a 
single-molecule sequencer comprising a polymerase having at least one fluorescent tag 
associated therewith; (c) an excitation light source adapted to excite the at least one 
fluorescent tag on the polymerase or on tiie monomers; and (d) a fluorescent light detector 
adapted to detect at least an intensity of emitted fluorescent light from the at least one 
fluorescent tag on the polymerase; where the intensity change of at least one fluorescent tag 
emits or fails to emit fluorescent light each time a new nucleotide or nucleotide analog is 
polymerized into a complementary sequence and either the duration of the emission or lack 
of emission or the wavelength range of the emitted light evidences the particular nucleotide 
or nucleotide analog polymerized into the sequence so that at the completion of the 
sequencing the data stream is retrieved. 

The present invention provides a system for storing and retrieving data comprising: 
(a) a sequence of nucleotides or nucleotide analogs representing a given data stream; (b) a 
single-molecule sequencer comprising a polymerase having at least one fluorescent tag 
covalently attached thereto; (c) an excitation light source adapted to excite at least one 
fluorescent tag on the polymerase; and (d) a fluorescent light detector adapted to detect 
emitted fluorescent light from at least one fluorescent tag on the polymerase; where at least 
one fluorescent tag emits or fails to emit fluorescent light each time a new nucleotide or 
nucleotide analog is polymerized into a complementary sequence and either the duration of 
tiie emission or lack of emission or the wavelength range of the emitted light evidences the 
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particular nucleotide or nucleotide analog polymerized into the sequence so that at the 
completion of the sequencing the data stream is retrieved. 

The present invention provides a system for storing and retrieving data comprising: 
(a) a sequence of nucleotides or nucleotide analogs representing a given data stream; (b) a 
single-molecule sequencer comprising a polymerase having at least one fluorescent tag 
covalently attached thereto; (c) an excitation Ught souree adapted to excite the at least one 
fluorescent tag on the polymerase or the monomers; and (d) a fluorescent Ught detector 

adaptedto detect emittedfluorescent Ught fiomatleast one fluorescenttag on the polymerase 
or the monomers; where at least one fluorescent tag emits or Ms to emit fluorescent Ught 
each time a new nucleotide or nucleotide analog is pofymerized into a complementaiy 
sequence and either the duration of the emission or lack of emission or the wavelength range 

oftheemitted light evidences theparticular nucleotide or nucleotide analogpolymerized into 
tiie sequence so that at tiie completion of the sequencing the data stream is retrieved. 

The present invention provides a method for sequencing a molecular sequence 
comprising tiie steps of: (a) a sequenced of nucleotides or nucleotide analogs representing a 
given data stream; (b) a single-molecule sequencer comprising a polymerase having at least 

onefluorescenttagcovalentiyattachedtiiereto;(c)anexcitationUghtsourceadapted to excite 
at least one fluorescent tag on tiie polymerase or tiie monomers; and (d) a fluorescent Ught 
detector adapted to detect emitted fluorescent light from at least one fluorescent tag on tiie 
polymerase; where at least one fluorescent tag emits or fails to emit fluorescent Ught each 
time a new nucleotide or nucleotide analog is polymerized into a complementary sequence 
and eitiier tiie duration of tiie emission or lack of emission or tiie wavelengfli range of tiie 
emitted Ught evidences tiie particular nucleotide or nucleotide analog polymerized hxto tiie 
sequence so tiiat at tiie completion of tiie sequencing tiie data stream is retrieved. 

The present invention provides a metiiod for syntiiesizing a y-phosphate modified 
nucleotide comprising tiie steps of attaching a molecular tag to a pyrophosphate group and 

contactingtiiemodifiedpyrophosphatewitiiadNMPtoproduceaY-phosphatetaggeddNTP. 

ThepresentinventionprovidesametiiodforS'end-labeUngabiomoleculecomprising 
tiie step of contacting tiie biomolecule wltii a kinase able to transfer a y-phosphate of a y- 
Phosphate labeled ATP to tiie 5' end of tiie biomolecule resulting in a covalenfly modified 
biomolecule. 

The present invention provides a mefliod for end-labeUng a polypeptide or 
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carbohydrate comprising the step of contacting the polypeptide or carbohydrate with an agent 
able to transfer an atomic or molecular tag to either a carboxy or amino end of a protein or 
polypeptide or to either the yphosphate of a y-phosphate labeled ATP to the 5' end of the 
biomolecule resulting in a covalently modified biomolecule. 

DESCRIPTION OF THE DRAWINGS 
The invention can be better understood with reference to the following detailed 
description together with the appended illustrative drawings in which like elements are 
numbered the same: 

Figure 1 depicts FRET activity as a function of distance separating the fluorescent 
donor and acceptor; 

Figure 2 dq)icts the open and closed ternary complex forms of the large fiagment of 
Taq DNA pol I (Klentaq 1); 

Figures 3A-C depicts an overlay between 3ktq (closed 'black') and Itau (open light 
blue*), the large fragment of Taq DNA polymerase I; 

Figure 4 depicts an image of a 20% denaturing polyaoyamide gel containing size 
separated radiolabeled products from DNA extension experiments involving yANS- 
phosphate-dATP; 

Figure S depicts an image of (A) the actual gel, (B) a lightened phosphorimage and 
(C) an enhanced phosphorimage of products generated in DNA extension reactions using y- 
ANS-phosphate-dNTPs; 

Figure 6 depicts an image of (A) 6% denaturing polyacrylamide gel, (B) a lightened 
phosphorimage of the actual gel, and (C) an enhanced phosphorimage of the actual gel 
containing products generated in DNA extension reactions using y-ANS-phosphate-dNTPs; 
DETAILED DESCRIPTION OF THE INVENTION 

The inventors have devised a methodology using tagged monomers such as dNTPs 
and/or tagged polymerizing agents such as polymerase and/or tagged agents associated with 
the polymerizing agent such as polymerase associated proteins or probes to directly readout 
tiie exact monomer sequence such as a base sequence of an RNA or DNA sequence during 
polymerase activity. The methodology of this invention is adaptable to protein synthesis or 
to carbohydrate synthesis or to the synthesis of any molecular sequence vAitrt the sequence 
of monomers provides usable information such as the sequence of a RNA or DNA molecule, 
a protein, a carbohydrate, a mixed biomolecule or an morganic or organic sequence of 
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monomers wWchstoresadata Stream. The methods and apparatuses using thesemethoA^ 
designed to create new ways to address basic research questions such as monitoring 
conformation changes occurring during replication and assaying polymerase incorporation 
fidelity in a variety of sequence contexts. The single-molecule detection systems of this 
invention are designed to improve fluorescent molecule chemistry, computer modeling. 
base-caUing algorithms, and genetic engineering of Womolecules. especially for real-time or 
near real-time sequencing. The mventors have also found that the methodology can be 
adapted to depolymerizing agents such as exonucleases where the polymer sequence is 
determined by depolymerization instead of polymerizatioa Moreover, the single-molecule 
systems of this invention are amendable to parallel and/or massively parallel assays, where 
tagged polymerases are pattemed in arrays on a substrate. The data collected fiom such 

arrays can be used to improve sequence confidence and/or to simultaneously sequence DNA 
regions from many different sources to identify similarities or differences. 

The single-molecule DNA sequencing systems of this invention have the potential to 
replace current DNA sequencing technologies, because the methodology can decrease time, 
jlabor, and costs associated with the sequencmg process, and can lead to highly scalable 
sequencing systems, improving the DNA sequence discovery process by at least one to two 
orders of magnitude per reaction. 

The pattern of emission signals is coUected, either directly, such as by an Intensitifed 
Charge Coupled Devise aCCD) or through an intemiediate or series of intermediates to 
amplify signal prior to electronic detection, where the signals are decoded and confidence 

values are assignedtoeachbase to reveal the sequence complementary to thatofthe template. 
Thus, the present invention also provides techniques for amplifying the fluorescent light 
emitted &om a fluorescent tag using physical light amplification techniques or molecular 
cascadmg agent to amplify the Ught produced by single-molecular fluorescent events. 

The single-molecule DNA sequencing technology of this invention can: (I) make it 

easier to classify an organism oridentify variations within an organism by simply sequenci^ 
the genome or a portion thereof; (2) make rapid identification of a pathogen or a 
genetically-modified pathogen easier. especiaUy in extreme circumstances such as in 
pathogens used in warfare; and (3) make rapid identification of persons for either law 
enforcement and military applications easier. 

One embodiment of the single-molecule sequencmg technology of this invention 
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involves strategically positioning a pair of tags on a DNA polymerase so that as a dNTP is 
incorporated during the polymerization reaction, the tags change relative separation. This 
relative change causes a change in a detectable property, such as the intensity and/or 
frequency of fluorescence from one or both of the tags. A time profile of these changes in 
the detectable property evidences each monomer incorporation event and provides evidence 
about which particular dNTP is being mcoipoiated at each incorporation event. The pair of 
tags do not have to be covalently attached to the polymerase, but can be attached to molecules 
that associate with the polymerase in such a way that the relative separation of the tags 
change during base incorporation. 

Another embodiment of the single-molecule sequencing technology of this invention 
involves a single tag strategically positioned on a DNA polymerase that interacts with a tag 
on a dNTP or separate tags on each dNTP. The tags could be different for each dNTP such 
as color-coded tags which emit a different color of fluorescent light. As the next dNTP is 
incorporated during the polymerization process, the identity of the base is indicated by a 
signature fluorescent signal (color) or a change in a fluorescent signal intensity and/or 
frequency. The rate of polymerase incorporation can be varied and/or controlled to create an 
essentially "real-time" or near "real-time" or real-time readout of polymerase activity and 
base sequence. Sequence data can be collected at a rate of > 1 00,000 bases per hour from each 
polymerase. 

In another embodiment of the single-molecule sequencing technology of this 
invention, the tagged polymerases each include a donor tag and an acceptor tag situated or 
located on or within the polymerase, where the distance between the tags changes diiring 
dNTP binding, dNTP incorporation and/or chain extension. This change in inter-tag distance 
results in a change in the intensity and/or wavelengtii of emitted fluorescent light from the 
fluorescing tag. Monitoring the changes in intensity and/or frequency of the emitted light 
provides information or data about polymerization events and the identity of incorporated 
bases. 

In another embodiment, the tags on the polymerases are designed to interact with the 
tags on ttie dNTPs, where the interaction changes a detectable property of one or both of the 
tags. Each fluorescentiy tagged polymerase is monitored for polymerization using tagged 
dNTPs to determine the efiQcacy of base incorporation data derived therefrom. Specific 
assays and protocols have been developed along with specific analytical equipmrnt to 
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measure and quantify the fluorescent data allowing the detennination and identification of 
each incorporated dNTP. Concunently, the inventors have identified tagged dNTPs that are 
polymerized by suitable polymerases and have developed software that analyze the 
fluorescence emitted from the reaction and mterpret base identity. One skilled m the art will 
recognize that appropriate fluorescently active pairs are well-known m the art and 
commercially available fix>m sudi vendors as Molecular Probes locMed in Oregon or 
Bioseareh Technologies, Inc. in Novato, CA. 

The tagged DNA polymerase for use in this mvention are genetically engineered to 
provide one or more tag bindmg sites that aUow the different embodunents of this invention 
to operate. Once a suitable polymerase candidate is identified, specific amino acids within 
the polymerase are mutated and/or modified such reactions weU-known in the art; provided, 
however, that the mutation and/or modification do not significantly adversely affect 
polymerization efiSciency. The mutated and/or modified amino acids are adapted to fecilitate 
tag attachment such as a dye or fluorescent donor or acceptor molecule in the case of light 
activated tags. Once formed, the engineered polymerase can be contacted with one or more 
appropriate tags and used in the apparatuses and methods of this invention. 

Engineering a polymerase to function as a direct molecular sensor of DNA base 
identity provides a route to a fast and potentially real-time enzymatic DNA sequencing 
system. The smgle-molecule DNA sequencing system of this invention can significantly 
reduce time, labor, and costs associated with the sequencing process and is highly scalable. 
The single-molecule DNA sequencing system ofthis invention: (1) can improve the sequence 
discovery process by at least two orders of magnitude per reaction; (2) is not constrained by 
the lengdi limitations associated with the degradation-based, single-molecule methods; and 
(3) aUows direct sequencing of desired (target) DNA sequences, especiaUy genomes without 
the need for cloning or PGR amplification, both of which introduce eiiors m the sequence. 
The systems of this mvention can make easier the task of classifying an organism or 
identifying variations withm an organism by simply sequencing the genome in question or 
any desired portion of the genome. The system of this mvention is ad^jted to rapidly identify 
pathogensorengineeredpathogens, which has hnportance for assessinghealth-related effects, 
and forgeneral DNA diagnostics, including cancer detection and/or characterization, genome 
analysis, or a more comprehensive form of genetic variation detection. The single-molecule 
DNA sequencing system of this invention can become an enabling platform technology for 
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single-molecule genetic analysis. 

The single-molecule sequencing systems of this invention have the foUovsring 
advantages: (1) the systems eliminates sequencing reaction processing, gel or capillary 
loading, electrophoresis, and data assembly; (2) the systems results in significant savings in 
labor, time, and costs; (3) the systems allows near real-time or real-time data acquisition, 
processing and determination of incorporation events (timing, diiration, etc,% base sequence, 
etc.; (4) the systems allows parallel or massively parallel sample processing in microarray 
format; (S) the systems allows rapid genome sequencing, in time firames of a day or less; (6) 
the systems requires very small amoimt of material for analysis; (7) the systems allows rapid 
genetic identification, screening and characterization of animals including humans or 
pathogen; (8) the systems allows large increases in sequence throughput; (9) the system can 
avoid error introduced in PGR, RT-PCR, and transcription processes; (10) the systems can 
allow accurate sequence information for aUele-specific mutation detection; (1 1) the systems 
allows rapid medical diagnostics, e.g. , Single Nucleotide Polymorphism (SNP) detection; (12) 
the systems allows improvement in basic research, e.g., examination of polymerase 
incorporation rates in a variety of different sequence conteTcts; analysis of errors in different 
contexts; epigenotypic analysis; analysis of protein glycosylation; protein identification; (13) 
the systems allows the creation of new robust (rugged) single-molecule detection apparatus; 
(14) the systems allows the development of systems and procedures that are compatible with 
biomolecules; (15) the systems allows the development genetic nanomachines or 
nanotechnology; (1 6) the systems allows the construction of large genetic databases and (17) 
the system has high sensitivity for low mutation event detection. 
BRIEF OVERVIEW OF SINGLE-MOLECULE DNA SEQUENCING 

In one embodiment of the single-molecule DNA sequencing system of this invention, 
a single tag is attached to an appropriate site on a polymerase and a unique tag is attached to 
each of the four nucleotides: dATP, dTTP, dCTP and dGTP. The tags on each dNTPs are 
designed to have a imique emission signature (i,e,y different emission firequency spectrum or 
color), which is directly detected vpon incorporatiorL As a tagged dNTP is incorporated into 
a growing DNA polymer, a characteristic fluorescent signal or base emission signature is 
emitted due to the int^ction of polymerase tag and the dNTP tag. The fluorescent signals, 
ie., the emission intensity and/or frequency, are then detected and analyzed to determine 
DNA base sequence. 
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One criteria for selection of the tagged polymerase and/or dNTPs for use m this 
invention is that the tags on either the polymerase and/or the dNTPs do not interfere with 
Watson-Crick base-pairing or significantly adversely impact polymerase activity. The 

inventois have found that dNm containing tags attached to the terminal (gamma)phosph^^ 
are incorporated by a native Taq polymerase either in combination with mitagged dNTPs or 

using onlytagged dNTPs. Tagging the dNl?s on thepand/oryphosphate group ispreferiw^ 
because the resulting DNA strands do not mclude any of the dNTP tags in their molecular 
make up, minimizing enzyme distortion and background fluorescence. 

One embodiment of the sequencing system of this invention involves placing a 
fluorescent donor such as fluorescein or a fluorescem-type molecule on the polymerase and 
unique fluorescent acceptors such as a d-rhodamine or a similar molecule on each dNTP, 
where each unique acceptor, when mteracting with the donor on the polymerase, generates 
a fluorescent spectrum including at least one distinguishable fiequency or spectral feature. 
As an incoming, tagged dNTP is bomid by the polymerase for DNA elongation, the detected 
fluorescent signal or spectrum is analyzed and the identity of the incorporated base is 
determined. 

Another embodimentofthe sequencing system of this invention involves afluorescent 
tag on the polymerase and unique quenchers on the dNTPs, where the quenchers preferably 
have distinguishable quenching efBciencies for the polymerase tag. Consequently, the 
identity of each incoming quencher tagged dNTP is determined by its unique quenching 
efBciency of the emission of the polymerase fluorescent tag. Again, the signals produced 
during incorporation are detected and analyzed to determine each base incorporated, the 
sequence of which generates the DNA base sequence. 
REAGENTS 

Suitable polymerizing agents for use in this invention include, without limitation, any 
polymerizmg agent that polymerizes monomers relative to a specific template such as a DNA 

or RNApolymerase, reverse transcriptase, orthelikeorthatpolymerizesmonomersinasl^ 
wise fashion. 

Suitable polymerases for use in this mvention include, without limitation, any 
polymerase that can be isolated from its host in sufficient amounts for purification and use 
and/or genetically engineered into other organisms for expression, isolation and purification 
in amounts sufficient for use in this invention such as DNA or RNA polymerases that 
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polymerize DN A, RNA or mixed sequences, into extended nucleic acid polymers. Preferred 
polymerases for use in this invention include mutants or mutated variants of native 
polymerases where the mutants have one or more amino acids replaced by amino acids 
amenable to attaching an atomic or molecular tag, w^hich have a detectable property. 
Exemplary DN A polymerases include, without limitation, HIV 1 -Reverse Transcriptase using 
either RNA or DNA templates, DNA pol I from T, aquaticus or E. coli, Bateriophage T4 
DNA pol, T7 DNA pol or the like. Exemplary RNA polymerases include, without limitatioD, 
T7 RNA polymerase or the like. 

Suitable depolymerizing agents for use in this invention include, without limitation, 
any depolymerizing agent that depolymerizes monomers in a step-wise &shion such as 
exonucleases in the case of DNA, RNA or rabced DNA/RNA polymers, proteases in the case 
of polypeptides and enzymes or enzyme systems that sequentially depolymerize 
polysaccharides. 

Suitable monomers for use in this invention include, without limitation, any monomer 
that can be step-wise polymerized into a polymer using a polymerizing agent Suitable 
nucleotides for use in this invention include, without linutation, naturally occurring 
nucleotides, synthetic analogs thereof, analog having atomic and/or molecular tags attached 
thereto, or mixtures or combinations thereof. 

Suitable atomic tag for use in this invention include, without limitation, any atomic 
element amenable to attachment to a specific site in a polymerizing agent or dNTP, especially 
Europium shift agents, mnr active atoms or the like. 

Suitable atomic tag for use in this invention include, without limitation, any atomic 
element amenable to att,achment to a specific site in a polymerizing agent or dNTP, 
especially fluorescent dyes such as d-Rhodamine acceptor dyes including dichloro[R110], 
dichloro[R6G], dichlorofTAMRA], dichloro[ROX] or the like, fluorescein donor dye 
including fluorescein, 6-FAM, or the like;Acridine including Acridine orange, Acridine 
yellow, Proflavin, pH 7, or the like; Aromatic Hydrocarbon including 2-Methylbenzoxazole, 
Ethyl p-dimethylaminobenzoate, Phenol, Pyrrole, benzene, toluene, or the Uke; Aiyhnethine 
Dyes including Auramine O, Crystal violet, H20, Crystal violet, glycerol, Malachite Green 
or the like; Coumarin dyes including 7-Methoxycoumarin-4-acetic acid, Coumarin 1, 
Coumarin 30, Coumarin 3 14, Coumarin 343, Coumarin 6 or the like; Cyanine Dye including 
1 , 1 '-diethyl-2,2'-cyanine iodide, Cryptocyanine, Indocarbocyanine (C3)dye, 
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Indodicarbocyanine (C5)dye. Indotricarbocyanine (C7)dye, Oxacarbocyanine (C3)dye. 
Oxadicarbocyanine (C5)dye, Oxatricarbocyanine (C7)dye, Pinacyanol iodide. Stains aU, 
Thiacarbocyanine (C3)dye, ethanol, Thiacarbocyanine (C3)dye. n-propanol! 
Thiadicarbocyanine (C5)dye, Thiatricarbocyanine (C7)dye. or the like; Dipyrrin dye^ 
including N,N'-Difluoroboiyl-l,9-dimethyl-5-(4-iodophenyI)-dipyrrin, N^^-Difluoroboryl- 

1.9- dimethyl-5-[(4-(2-trimethylsilylethynyl), N,N'-Difluoroboryl-l,9-dimethyl.5. 
phenydipyrrin, or the like; Merocyanines including 4.(dicyanomethylene>2-methyl-6-(p- 
dimethylaininostyiyl)-4H-pyran (DCM), acetonitrile, 4.(dicyanoinethylene>2-methyl-6-(p- 
dimethylaminostyrylMH-pyran (DCM). methanol, 4.Dimethylanuno-4'-mtn)stilbene. 
Merocyanine 540, or the like; MisceUaneous Dye including 4',6.Dianudino-2-phenylindole 
(DAPI), 4',6-Diamidino-2-phenylindole (DAPI), dimethylsulfoxide, 7-Ben2ylaniino-4- 

nitroben2.2.oxa-l,3-diazole,Dansyl glycine, H20,Dansyl glycine. dioxane,Hoechst33258, 
DMF, Hoechst 33258. H20. Lucifer yellow CH. Piroxicam, Quinine sulfate, 0.05 MH2S04, 
Quinine sulfete. 0.5 M H2S04, Squarylium dye m, or the like; OUgophenylenes including 
2,5-Diphenyloxa2ole (PPO), Biphenyl, POPOP, p-Quaterphenyl. p-Terphenyl. or the like; 
Oxazmes including Cresyl violet perchlorate, Nile Blue, methanol, NUe Red, NUe blue, 
ethanol. Oxazine 1, Oxazine 170, or the like; Polycyclic Aromatic Hydrocarbons including 

9.10- Bis(phenylethynyl)anthracene, 9.10-Diphenylanthracene. Anthracene, Naphthalene, 
Perylene. Pyrene, or the like; polyene/polyynes including 1,2-diphenylacetylene, 1,4- 
diphenylbutadiene, 1,4-diphenylbutadiyne, 1,6-Diphenylhexatriene. Beta-carotene, StUbene, 
or the like; Redox-active Chromophores including Anthraquinone. Azobenzene! 
Benzoquinone, Ferrocene, Riboflavin, Tris(2,2'-bipyridyl)ruthemum(II), Tetrapyrrole.' 

Bilirubin. Chlorophyll a, diethyl ether, Chlorophyll a, methanol, ChlorophyUb.Diprotonated- 
tetraphenylporphyrin, Hematin, Magnesium octaethylporphyrin, Magnesium 
octaethylporphyrin (MgOEP). Magnesium phthalocyanine (MgPc). PrOH, Magnesium 
phlhalocyanine (MgPc), pyridine, Magnesium tetramesitylporphyrin (MgTMP), Magnesium 
tetraphenylporphyrin (MgTPP), Octaethylporphyrin, Phthalocyanine (Pc), Porphin. Tetm-t- 
butylazaporphine, Tetra-t-butyhiaphthalocyanine. Tetrakis(2.6-dichlorophenyl)porphyrin. 
Tetrakis(o-aminophenyl)poiphyrin, Tetramesitylporphyrin (TMP), Tetraphenylporphyrin 
(TPP), Vitamin B12, Zinc octaethylporphyrin (ZnOEP), Zinc phthalocyanine (ZnPc). 
pyridine. Zinc tetramesitylporphyrin (ZnTMP), Zmc tetramesitylporphyrin radical cation,' 
Zuic tetraphenylporphyrin (ZnTPP), or the like; Xanthenes inchiding Bosin Y. Fluorescein, 
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basic ethanol, Fluorescein, ethanol, Rhodaminc 123, Rhodamine 6G, Rhodamine B, Rose 
bengal, Sulforhodamine 101, or the like; or mixtures or combination thereof or synthetic 
derivatives thereof or FRET fluorophore-quencher pairs including DLO-FBl (5-FAM/3- 
BHQ-1) DLO-TEBl (5'-TET/3'-BHQ-l), DLO-JBl (5'-JOE/3'.BHQ-l), DLO-HBl (5*- 
HEX/3'.BHQ4),DLO.C3B2 (5'-Cy3/3'-BHQ-2), DL0.TAB2 (S'-TAMRA/^ 
RB2(5*-ROX/3'-BHQ-2),DLO.C5B3 (5'-Cy5/3*-BHQ-3),DLO-C55B3 (5'.Cy5.5/3'-BHQ.3), 
MBO-FBl (5'-FAM/3*-BHQ-l), MBO-TEBl (5'-TET/3'-BHQ.l), MBO-JBl (5'-JOE/3'- 
BHQ-1), MBO-HBl (5*-HEX/3'-BHQ-l), MBO-C3B2 (5'.Cy3/3*-BHQ-2), MB0-TAB2 (5'- 
TAMRA/3*-BHQ-2), MB0-RB2 (5'-ROX/3'-BHQ-2); MBOC5B3 (5'-Cy5/3'-BHQ-3),MBO- 
C55B3 (5*-Cy5.5/3'-BHQ-3) or similar FRET pairs available from Biosearch Technologies, 
Inc. of Novato, CA, tags with nmr active groups, tags with spectral features that can be easily 
identified such as IR, far IR, visible UV, far UV or the like. 
ENZYME CHOICE 

The inventors have found that the DNA polymerase from Thermus aquaticus - Taq 
DNA polymerase I - is ideally suited for use in the single-molecule apparatuses, systems and 
methods of this invention. Taq DNA Polymerase, sometimes simply referred to herein as 
Taq^ has many attributes that the inventors can utilize in constructing tagged polymerases for 
use in the inventions disclosed in this application. Of course, ordinary artisans will recognize 
that other polymerases can be adapted for use in the single-molecule sequencing systems of 
this invention. 

Since Taq DNA polymerase I tolerates so many mutations within or near its active site 
(as reviewed in Patel et al, J. Mol Biol., volume 308, pages 823-837, and incorporated herein 
by reference), the enzyme is more tolerant of enzyme tagging modification(s) and also able 
to incorporate a wider range of modified nucleotide substrates. 
Crystal Structures Are Available for Taq DNA Polymerase 

There are 13 structures solved for Taq DNA polymerase, with or without DNA 
template/primer, dNTP, or ddNTP, v^ch allows sufBcient information for the selection of 
amino acid sites within tiie polymerase to v^ch an atomic and/or molecular tag such as a 
fluorescent tag can be attached without adversely affecting polymerase activity. See, e.g., 
Eomerd., \996\Uetal, 1998a; Lief a/., 1998b. Additionally, the inventors have a written 
program to aid in identifying optimial tag addition sites. The program compares structural 
data associated v^th the Taq polymerase in its open and closed form to identify regions in the 
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polymerase structure that are optimally positioned to optimize the difference in confonnation 
extremes between a tag on the polymerase and the dNTP or to optimize a change in 
separation between two tags on the polymerase, thereby increasing or maximizing changes 
in a detectable property of one of the tags or tag pair. 
Tag PNA Poivmeraae U Efficiently y^Y preaaed in K. (Jnf ( 

The Tag DNA polymerase is efBciently expressed in K coli allowing efficient 
production and purification of the nascent polymerase and variants thereof for rtqjid 
identification, characterization and optimization of an engineered Taq DNA polymerase for 
use in the single-molecule DNA sequencing systems of this invaition. 
No Cysteines Are Prtsscnt fn the Pratein ,« i l¥ qn«>nyy 

The Taq DNA polymerase contains no cysteines, which allows the easy generation 
of cysteine-containing mutants in which a single cysteine is placed or substituted for an 
existing amino acid at strategic sites, where the inserted cysteine serves as a tag attachment 
site. 

The Processivitv of the Rnzvme Can Rf, MndifiAri 

Although native Taq DNA polymerase may not represent an optimal polymerase for 
sequencing system of this invention because it is not a very processive polymerase (50-80 
nucleotides are incorporated before dissociation), the low processivity may be compensated 
for by appropriately modifying the base caUing software. Alternatively, the processivity of 
the Taq DNA Polymerase can be enhanced through genetic engineering by inserting into the 
polymerase gene a processivity enhancing sequence. Highly processive polymerases are 
expected to minimize complications that may arise from template dissociation effects, which 
can alter polymerization rate. The processivity of Taq can be genetically altered by 
introducing the 76 amino acid 'processivity domain' from T7 DNA polymerase between the 
HandH, heUces (at the tip of thumb' region within the polymerase) of Taq. The processivity 
domain also includes the thioredoxm binding domain (TBD) from T7 DNA polymerase 
causing the Taq polymerase to be thioredoxin-dependent increasing both the processivity and 
specific activity of TVz? polymerase. See, Bedford g^ai. 1997; Bedford e/oi. 1999. 
DNA Polymerase PosseMes a S> tn V ffmnn^ l ease Activity «nd Is Tiiem.n«toM. 
Single-stranded M13 DNA and synthetic oUgonucleotides are used in the initial 
studies. Afterpolymeraseactivityisoptimized,fliesequencmgsystemcanbeusedtodirectty 
determine sequence information fi»m an isolated chromosome -a double-stranded DNA 
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molecule. Generally, heating a sample of double-stranded DNA is sufficient to produce or 
maintain the double-stranded DNA in stranded DNA form for sequencing. 

To favor the single-stranded state, the 5' to 3' exonuclease activity of the native Taq 
DNA polymerase in the enzyme engineered for single-molecule DNA sequencing is retained. 
This activity of the polymerase is exploited by the TagMan' assay. The exonuclease activity 
removes a duplex strand that may renature do>vnstream from the replication site using a nick- 
translation reaction mechanism. Synthesis from the engineered polymerase is initiated either 
by a synthetic oligonucleotide primer (if a specific reaction start is necessary) or by a nick in 
the DNA molecule (if multiple reactions are processed) to determine the sequence of an entire 
DNA molecule. 

The Polymerase Is Free from 3' to 5' Exonuclease Activity 

The Taq DNA polymerase is does not contain 3' to 5* exonuclease activity, which 
means that the polymerase cannot replace a base, for which fluorescent signal was detected, 
with another base which would produce another signature fluorescent signal. 

All polymerases make replication errors. The 3* to 5* exonuclease activity is used to 
proofread the newly replicated DNA strand* Since Taq DNA polymerase lacks this 
proofreading fimction, an error in base incorporation becomes an error in DNA replication. 
Error rates for Taq DNA polymerase are 1 error per ~1 00,000 bases synthesized, which is 
suflBciently low to assure a relatively high fidelity. See, e.g, , Eckert and Kunkel, 1990; Cline 
et al, 1 996. It has been suggested and verified for a polymerase that the elimination of this 
exonuclease activity uncovers a decreased fidelity dxiring incorporation. Thus, Taq 
polymerase must - by necessity - be more accurate during initial nucleotide selection and/or 
incorporation, and is therefore an excellent choice of use in the present inventions. 

The error rate of engineered polymerases of this invention are assayed by determining 
their error rates in synthesizing known sequences. The error rate determines the optimal 
number of reactions to be run in parallel so that sequencing information can be assigned witii 
confidence. The optimal number can be 1 or 10 or more. For example, the inventors have 
discovered that base context influences polymerase accuracy and reaction kinetics, and this 
information is used to assign confidence values to individual base calls. However, depending 
on the goal of a particular sequencing project, it noay be more important to generate a genome 
sequence as rapidly as possible. For example, it may be preferable to generate, or draft, the 
genome sequence of a pathogen at reduced accuracy for initial identification purposes or for 



wo 02/04680 



PCT/US01/2181I 



-44- 

fast screening of potential pathogens. 

Tffff PNA Polvmerase Is the Enzvme of Chnice for Single^mnl ccule DNA Seemenclng 

Engineering the polymerase to function as a direct molecular sensor of DNA base 
identity provides the fastest enzymatic DNA sequencing system possible. For the reasons 
detailed above, Taq DNA polymerase is the optimal enzyme to genetically modify and adapt 
for single-molecule DNA sequencing. Additionally, basic research questions concemmg 
DNA polymerase structure and function during replication can be addressed using this 
technology advancing single-molecule detection systems and molecular models in other 
disciplines. The inventors have found that native Taq DNA polymerase incorporates gamma- 
tagged dNTPs, yielding extended DNA polymers. Importantly, incorporation of a modified 
nucleotide is not detrimental to polymerase activity and extension of primer strands by 
incorporation of a y-tagged nucleotide conforms to Watson-Crick base pairing rules. 
DETECTING TAGGED POLYMERASE-NUCLEOTIDE INTERACTIONS 

One preferred method for detecting polymerase-nucleotide interactions involves a 
fluorescence resonance energy transfer-based (FRET-based) method to maximize signal and 
minimize noise. A FRET-based method exists when the emission from an acceptor is more 
intense than the emission from a donor, /. e, , the acceptor has a higher fluorescence quantum 
yield than the donor at the excitation frequency. The efficiency of FRET method can be 
estimated form computational models. See, e.g., Furey et al, 1998; Clegg et al, 1993; 
Mathiese/flf/., 1990. The efficiency ofenergy transfer (E) is computed from the equation (1): 

E^\i[u[rIR,]) (1) 

where Ro is the FOrster critical distance at E=0.5. Ro is calculated from equation (2): 

R, = {9.19xW\K'n'Q^J^,Y' (2) 

where « is the leftactive index of the medium {n=\A for aqueous solution), is a geometric 
orientation factor related to the relative angle of the two transition dipoles (K* is generally 
assumed to be 2/3), l^k [M" W] is the oveii^ integral representing tiie normalized spectral 
overlq) of the donor emission and acceptor absorption, and is the quantum yield. The 
overlap integral is computed fix)m equation (3): 
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where Fq is the donor emission, is the acceptor absorption. is calculated from equation 
(4): 

where Ip and Irp are the fluorescence intensities of donor and a reference compound 
(fluorescein in 0. IN NaOH), and A^p and Aq are the absorbances of the reference compound 
and donor. is the quantum yield of fluorescein in 0. IN NaOH and is taken to be 0.90. 

R, the distance between the donor and acceptor, is measured by looking at different 
configurations (e,g, , conformations) of the polymerase in order to obtain a conformationally 
averaged value. If both tags are on the polymerase, then R is the distance between the donor 
and acceptor in the open and closed conformation, while if the donor is on the polymerase 
and the acceptor on the dNTP» R is the distance between the donor and acceptor when the 
dNTP is bound to the polymerase and the polymerase is its closed form. 

The distance between the tagged y-phosphate and the selected amino acid sites for 
labeling in the open versus closed polymerase conformation delineates optimal dye 
combinations. If the distance (R) between the donor and acceptor is the same as Ro (Ro is the 
FSrster critical distance), FRET efficiency (E) is 50%. If R is more than 1.5 Ro, the energy 
transfer efficiency becomes negligible (E < 0.02). Sites within the enzyme at which R/R^ 
differ by more than 1 .6 in the open versus closed forms are identified and, if necessary, these 
distances and/or distance differences can be increased through genetic engineering. A plot 
of FRET efficiency verses distance is shown in Figure 1. 
Fluorescent Dve Selection Process 

Dye sets are chosen to maximize energy transfer efficiency between a tagged dNTP 
and a tag on the polymerase when the polymerase is in its closed configuration and to 
minimize energy transfer efficiency between the tag on the dNTP (either non-productively 
bound or in solution) and the tag on the polymerase when the polymerase is in its open 
configuration. Given a molarity of each nucleotide in the reaction medium of no more than 
about 1 ^M, an average distance between tagged nucleotides is calculated to be greater than 
or equal to about 250 A. Because Ms distance is several fold larger flian the distance 
separating sites on the polymerase in its open to closed conformational, minimal FRET 
background between the polymerase and free dNTPs is observed Preferably, nucleotide 
concentrations are reduced below 1 pM. Reducing dNTP concentrations to levels of at least 
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<10% of the K„ further minimizes background fluorescence and provides a convenient 
method for controlling the rate of the polymerase reaction for the real-time monitoring. Under 
such conditions, the velocity of the polymerization reaction is linearly proportional to the 
dNTP concentration and, thus, highly sensitive to regulation. Additionally, the use of a single 
excitation wavelength allows improved identification of unique tags on each dNTP. A single, 
lower-wavelength excitation laser is used to achieve high selectivity. 

In one preferred embodiment, a fluorescence donor is attached to a site on the 
polymerase comprising a replaced amino acid more amenable to donor attachment such as 
cysteine and four unique fluorescence acceptors are attached to each dNTP. For example, 
fluorescein is attached to a site on the polymerase and rhodamine, rhodamine derivatives 
and/or fluorescein derivatives are attached to each dNTP. Each donor-acceptor fluoiophore 
pair is designed to have an absorption spectra sufficiently distmct from the other pairs to 
allow separate identification after excitation. Preferably, the donor is selected such that the 
excitation light activates the donor, v^Wch tiien efficiency transfers the excitation energy to 
one of the acceptors. After energy transfer, the acceptor emits it unique fluorescence 
signature. The emission of flie fluorescence donor must significant overlap with the 
absorption spectra of the fluorescence acceptors for efficient energy transfer. However, the 
methods of this invention can also be performed using two, three or four unique fluorescence 
donor-acceptor pairs, by running parallel reactions. 

Fluorophore choice is a function of not only its enzyme compatibility, but also its 
spectral and photophysical propaties. For instance, it is critical that the acceptor fluorophore 
does not have any significant absorption at the excitation wavelength of the donor 
fluorophore, and less critical (but also desirable) is that the donor fluorophore does not have 
emission at the detection wavelength of the acceptor fluorophore. These spectral properties 
can be attenuated by chemical modifications of the fluorophore ring systems. 

Although the dNTPs are amenable to tagging at several sites mcluding the base, flie 
sugar and the phosphate grovq)s, the dNTPs are preferably tagged at either iJie P and/or y 
phosphate. Tagging the terminal phosphates of dNTP has a unique advantage. Whenflie 
incoming, tagged dNTP is bound to flie active site of the polymerase, significant FRET from 
the donor on the polymerase to the acceptor on the dNTP occurs. The unique fluorescence 
of the acceptor identifies which dNTP is incorporated. Once the tagged dNTP is incorporated 
into the growing DNA chain, the fluorescence acce^rtor, vMcb. is now attadied to the 



wo 02/04680 



PCT/USOl/21811 



-47- 

pyrophosphate group, is released to the medium with the cleaved pyrophosphate group. In 
fact, the growing DNA chain includes no fluorescence acceptor molecules at all. In essence, 
FRET occurs only between the donor on the polymerase and incoming acceptor-labeled 
dNTP, one at a time. This approach is better than the alternative attachment of the acceptor 
to a site within the dNMP moiety of the dNTP or the use of multiply-modified dNTPs. If the 
acceptor is attached to a site other than the p or y phosphate group, it becomes part of the 
growing DNA chain and the DNA chain will contain multiple fluorescence acceptors. 
Interference with the polymerization reaction and FRET measurements would likely occur. 

If the fluorescence from the tagged dNTPs in the polymerizing medium (background) 
is problematic, coUisional quenchers can be added to the polymerizing medium that do not 
covalently interact with the accq)tors on the dNTPs and quench fluorescence from the tagged 
dNTPs in the medium. Of course, the quenchers are also adapted to have insignificant contact 
with the donor on the polymerase. To minimize interaction between the collisional quenchers 
and the donor on the polymerase, the polymerase tag is preferably localized internally and 
shielded firom the collisional quenchers or the collisional quencher can be made sterically 
bulky or associate with a sterically bulky group to decrease interaction between the quencher 
and the polymerase. 

Another preferred method for detecting polymerase-nucleotide interactions involves 
using nucleotide-specific quenching agents to quench the emission of a fluorescent tag on the 
polymerase. Thus, the polymerase is tagged with a fluorophore, while each dNTP is labeled 
with a quencher for the fluorophore. Typically, DABCYL (4-(4'-dimethylaminophenylazo) 
benzoic acid is a universal quencher, which absorbs energy from a fluorophore, such as 5-(2 - 
aminoethyl) aminonaphthalene-1 -sulfonic acid (AEANS) and dissipates heat Preferably, a 
quencher is selected for each dNTP so that when each quencher is brought into close 
proximity to the fluorophore, a distinguishable quenching efflciency is obtained. Therefore, 
the degree of quenching is used to identify each dNTP as it is being incorporated into the 
growing DNA chain. One advantage of this preferred detection method is that fluorescence 
emission comes &om a single source rendering back^ound noise negligible. Although less 
preferred, if only two or three suitable quenchers are identified, then two or three of the four 
dNTPs are labeled and a series of polymerization reaction are made each time with a different 
pair of the labeled dNTPs. Combining the results firom tiiese runs, generates a complete 
sequence of the DNA molecule. 
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SITE SELECTION FOR LABELING THE TAQ POLYMERASE AND dNTPs 

Although the present invention is directed to attaching any type of atomic and/or 
molecular tag that has a detectable property, the processes for site selection and tag 
attachment are illustrated using a preferred class of tags, namely fluorescent tags. 
Fluorescent Labeling of Polvmeraac md/or dNTPa 

The fluorescence probes or quenchers attached to the polymerase or dNTPs are 
designed to minunize adverse effects on the DNA polymerization reaction. The inventors 
have developed synthetic methods for chemically tagging the polymerase and dNTPs with 
fluorescence probes or quenchers. 

In general, the polymerase is tagged by replacing a selected amino acid codon in the 
DNA sequence encodmg the polymerase with a codon for an amino acid that more easily 
reacts with a molecular tag such as cysteine via mutagenesis. Once a mutated DNA sequence 
is prepared, the mutant is inserted into E, coli for expression. After expression, the mutant 
polymerase is isolated and purified. The purified mutant polymerase is then tested for 
polymerase activity. After activity verification, the mutant polymerase is reacted with a slight 
molar excess of a desired tag to achieve near stoichiometric labeling. Alternatively, the 
polymerase can be treated with an excess amount of the tag and labeling followed as a 
fimction of time. The tagging reaction is than stopped when near stoichiometric labeling is 
obtained. 

If the mutant polymerase includes several sites including the target residue that can 
undergo tagging with the desired molecular tag, then the tagging reaction can also be carried 
out under special reaction conditions such as using aprotecting group or competitive inhibitor 
and a reversible blocking group, which are later removed. If the target amino acid residue 
m the mutant polymerase is close to the active dNTP bmding site, a saturating level of a 
protecting group or a competitive inhibitor is first added to protect the target residue and a 
reversible blocking group is subsequently added to inactivate non-target residues. The 
protecting groiqj or competitive inhibitor is then removed firom the target residue, and the 
mutant polymerase is treated with the desured tag to label the target residue. Finally, the 
blocking groups are chemically removed firom non-t^get residues in the mutant polymerase 
and removed to obtam a tagged mutant polymerase with flie tag substantially to completely 
isolated on the target residue. 

Alternatively, if the target residue is not near the active site, the polymerase can be 
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treated with a blocking group to inactivate non-target residues. After removal of unreacted 
blocking group, the mutant polymerase is treated with the desired tag for labeling the target 
residue. Finally, the blocking groups are chemically removed from the non-target residues 
in the mutant polymerase and removed to obtain the tagged mutant polymerase. 
Amino Acide Site Selection for the Tag Polymerase 

The inventors have identified amino acids in the Taq polymerase that are likely to 
withstand mutation and subsequent tag attachment such as the attachment of a fluorescent tag. 
While many sites are capable of cysteine replacement and tag attachment, preferred sites in 
the polymerase were identified using the following criteria: (1) they are not m contact with 
other proteins; (2) they do not alter the conformation or folding of the polymerase; and (3) 
they are not involved in the function of the protein. The selections were accomplished using 
a combination of mutational studies including sequence analyses data, computational studies 
including molecular docking data and assaying for polymerase activity and fidelity. After site 
mutation, computational studies will be used to refine the molecular models and help to 
identify other potential sites for mutation. 

Regions of the protein surface that are not important for fimction were identified, 
indirectly, by investigating the variation in sequence as a fimction of evolutionary time and 
protein fimction using the evolutionary trace method. See, e.g, , Lichtarge et al., 1996. In this 
approach, amino acid residues that are important for structure or function are found by 
comparing evolutionary mutations and structural homologies. The polymerases are ideal 
systems for this type of study, as there are many crystal and co-crystal structures and many 
available sequences. The inventors have excluded regions of structural/fimctional importance 
from sites selection for mutation/labeling. In addition, visual inspection and overlays of 
available crystal structures of the polymerase in different conformational states, provided 
further assistance in identifying amino acid sites near the binding site for dNTPs. Some of the 
chosen amino acids sites are somewhat internally located and preferably surroxmd active 
regions in the polymerase that undergo changes during base incorporation, such as the dNTP 
binding regions, base incorporation regions, pyrophosphate release regions, etc. These 
internal sites are preferred because a tag on these sites show reduced background signals 
during detection, reduce interactionbetweenthe polymerase en2yme andnon-specifically 
associated tagged dNTPs, when fluorescently tagged dNTPs are used. 

Once tagged mutant polymerases are prepared and energy minimized in a full solvent 
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environment, estimates of the effect on the structure of the polymerase due to the mutation 
and/or labeling are generated to provide information about relative tag positioning and 
separation. This data is then used to estimate FRET efficiencies prior to measurement. Of 
course, if the dNTPs are tagged with quenchers, then these considerations are not as 
important. 

Another aspect of this invention mvolves the construction of molecular mechanics 
force field parameters for atomic and/or molecular tags such as fluorescent tags used to tag 
the dNTPs and the polymerase and parameters for the fluorescent tagged amino acid on the 
polymerase and/or dNTP. Force field parameters are using quantum mechanical studies to 
obtain partial charge distributions and energies for relevant intramolecular conformations 
(/.e., for the dihedral angle definitions) derived fit)m known polymerase crystal structures. 

Ionization states of each ionizable residue are estimated using an electrostatic model 
in which the protein is treated as a low dielectric region and the solvent as a high dielectric, 
using the UHBD program. See, e.g., Antosiewiczef a/., 1994; Briggs and Antosiewicz, 1999; 
Madura et al^ 1995. The electrostatic firee energies of ionization of each ionizable residue 
are computed by solving the Poisson-Boltzmann equation for each residue. These individual 
ionization firee energies are modified to take into account coupled titration behavior resulting 
in a set of self-consistent predicted ionization states. These predicted ionization fi:ee energies 
are then recalculated so that shifts in ionization caused by the binding of a dNTP are taken 
into account. Unexpected ionization states are subject to further computational and 
experimental studies, leading to a set of partial charges for each residue in the protein, z.e., 
each ionizable residue in the protein can have a different charge state depending on the type 
of attached tag or amino acid substitution. 

To further aid in amino acid site selection, an electrostatic potential map is generated 
firom properties of the molecular surface of the Taq polymerase/DNA complex, screened by 
solvent and, optionally, by dissolved ions (/.e., ionic strength) using mainly the UHBD 
program. The map provides guidance about binding locations for the dNTPs and the 
electrostatic environment at proposed mutation/labeling sites. 

The molecular models generated are designed to be continually refined takmg into 
accoimt new experimental data, allowing the construction of improved molecular models, 
improved molecular dynamics calculations and improved force field parameters so tihat the 
models better predict system behavior for refining tag chemistry and/or tag positioning. 
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predicting new polymerase mutants, base incorporation rates and polymerase fidelity. 

Molecular docking simulations are used to predict the docked orientation of the 
natural and fluorescently labeled dNTPs, within the polymerase binding pocket. The best- 
docked configurations are energy minimized in the presence of an explicit solvent 
environment. In conjxmction with amino acid sites in the polymerase selected for labeling, 
the docking studies are used to analyze how the tags interact and to predict FRET efficiency 
for each selected amino acid site. 

With the exception of &e electrostatics calculations, all docking, quantum mechanics, 
molecular mechanics, and molecular dynamics calculations are and will be performed using 
the HyperChem (v6.0) computer program. The HyperChem software nins on PCs under a 
Windows operating system. A number of computer programs for data analysis or for FRET 
prediction (as described below) are and will be written on a PC using the Linux operating 
system and the UHBD program running under Linux. 
Analysis of Polsanerase Structures 

Co-crystal structures solved for DNA polymerase I (DNA pol I) firom E. coli, T. 
aquaticus, B. stearothermophilm, T7 bacteriophage, and human pol P demonstrate that 
(replicative) polymerases share mechanistic and structural features. The structures that 
capture Tag DNA polymerase in an 'open' (non-productive) conformation and in a 'closed* 
(productive) conformation are of particular importance for identifying regions of the 
polymerase that undergo changes during base incorporation. The addition of the nucleotide 
to the polymerase/primer/template complex is responsible for the transition from its open to 
its closed conformation. Comparison of these structures provides information about the 
conformational changes that occur within the polymerase during nucleotide incorporation. 
Specifically, in the closed conformation, the tip of the fingers domain is rotated inward by 
46°, thereby positioning the dNTP at the 3' end of the primer strand in the polymerase active 
site. The geometry of this terminal base pair is precisely matched with that of its binding 
pocket The binding of the correct, complementary base &cilitates formation of the closed 
conformation, whereas incorrect dNTP binding does not induce this conformational change. 
Reaction chemistry occurs vAnsa the enzyme is in the closed conformation* 

Referring now to Figure 2, the open and closed ternary complex forms of the large 
firagment of Tag DNA pol I (Klentaq 1) are shown in a superimposition of their Ca tracings. 
The ternary complex contains the enzyme, the ddCTP and tiie primer/template duplex DNA. 
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The open structure is shown in magenta and the closed structure is shown in yellow. The 
disorganized appearance in the upper left portion of the protem shows movement of the 
'fingers* domain in open and closed conformations. 

Using a program to determine the change in position of amino acids in the open and 
closed conformation of the polymerase relative to the gamma phosphate of a bound ddGTP 
fi-om two different crystal structures of the Taq polymerase containing the pruner and boxmd 
ddGTP, lists of the 20 ammo acid sites that undergo the largest change m position for 
mutation and labeling were identified. The distances were calculated for each amino acid 
between then: alpha and beta carbon atoms and the gamma phosphate group of the bound 
ddGTP. Lists derived fix)m the two different sets of crystallographic data for the Taq 
polymerase are given in Tables I, H, m and IV. 

TABLE I 

The 20 Amino Acid Sites Undergoing the Largest Positional Change in 2ktq Data 
Between the Open Form of the Polymerase to the Closed Form of the Polymerase Relative 

to the Alpha Carbon of the Residue 



Residue 
Location 


Residue 
Identity 


Change in 
Distance 

(A) 


Residue 
Location 


Residue 
Identity 


Change in 
Distance (A) 


517 


Alanine 


9.10 


491 


Glutamic acid 


2.90 


516 


Alanine 


6.86 


486 


Serine 


2.78 


515 


Serine 


6.53 


490 


Leucine 


2.62 


513 


Serine 


6.40 


586 


Valine 


2.61 


518 


Valine 


5.12 


492 


Aiginine 


2.60 


514 


Threonine 


3.94 


462 


Glutamic acid 


2.59 


488 


Asparagine 


3.73 


483 


Asparagine 


2.47 


487 


Arginine 


3.50 


685 


Proline 


2.46 


489 


Glutamine 


3.13 


587 


Arginine 


2.44 


495 


Phenylalanin 
e 


3.05 


521 


Alanine 


2.38 



TABLE n 

The 20 Amino Acid Sites Undergoing the Largest Positional Change in 2ktq Data 
Between the Open Form of the Polymerase to tiie Closed Form of the Polymerase Relative 

to the Beta Carbon of the Residue 
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Residue 
Location 


Residue 
Identity 


Change in 
Distance 

(A) 


Residue 
Location 


Residue 
Identity 


Change In 
Distance 

(A) 


517 


Alanine 


10.98 


491 


Glvrtamic 
Acid 


3.41 


516 


Alanine 


9.05 


587 


Arginine 


3.39 


515 


Serine 


8.02 


521 


Alanine 


3.33 


513 


Serine 


7.46 


498 


Leucine 


3.21 


518 


Valine 


5.47 


489 


Glxitamine 


3.08 


685 


Proline 


5.16 


514 


Threonine 


2.97 


487 


Arginine 


4.24 


581 


Leucine 


2.93 


495 


Phenylalanin 
e 


3.94 


483 


Asparagine 


2.92 


488 


Aspartic Acid 


3.88 


497 


Glutamic 
Acid 


2.91 


520 


Glutamic 
Acid 


3.66 


462 


Glutamic 
Acid 


2.83 



TABLE in 

The 20 Amino Acid Sites Undergoing the Largest Positional Change in 3ktq Data 
Between the Open Form of the Polymerase to the Closed Form of the Polymerase Relative 

to the Alpha Carbon of the Residue 



. Residue 
Location 


Residue 
Identity 


Change in 
Distance 

(A) 


Residue 
Location 


Residue 
Identity 


Change in 
Distance 

(A) 


517 


Alanine 


8.95 


515 


Serine 


6.36 


656 


Proline 


8.75 


653 


Alanine 


6.16 


657 


Leucine 


8.59 


661 


Alanine 


5.94 


655 


Aspartic Acid 


8.05 


652 


Glutamic 
Acid 


5.44 


660 


Arginine 


7.35 


647 


Phenylalanin 
e 


5.25 


658 


Metionine 


7.06 


649 


Valine 


5.22 
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Residue 
Location 


Residue 
Identity 


Change in 
Distance 

(A) 


Residue 
Location 


Residue 
Identity 


Cliange in 
Distance 

(A) 


659 


Arginine 


6.69 


518 


Valine 


5.15 


654 


Valine 


6.60 


644 


Serine 


5.08 


513 


Serine 


6.59 


643 


Alanine 


5.01 


516 


Alanine 


6.57 


650 


Proline 


4.72 



TABLE IV 

The 20 Amino Acid Sites Undergoing the Largest Positional Change in 3ktq Data 
Between the Open Form of the Polymerase to the Closed Form of the Polymerase Relative 

to the Beta Carbon of the Residue 



Residue 
Location 


Residue 
Identity 


Change in 
Distance 

(A) 


Residue 
Location 


Residue 
Identity 


Cliange in 
Distance 

(A) 


517 


Alanine 


10.85 


654 


Valine 


6.25 


656 


Proline 


9.05 


653 


Alanine 


6.14 


657 


Leucine 


8.75 


661 


Alanine 


6.04 


516 


Alanine 


8.68 


643 


Alanine 


5.74 


655 


Aspartic Acid 


8.24 


649 


Valine 


5.55 


515 


Serine 


7.92 


647 


Phenylalanin 
e 


5.45 


660 


Arginine 


7.89 


518 


Valine 


5.42 


513 


Serine 


7.60 


652 


Glutamic 

Acid 


5.13 


659 


Argmine 


6.98 


644 


Serine 


4.89 


658 


Metionine 


6.77 


487 


Arginine 


4.77 



The above listed amino acids represent preferred amino acid sites for cysteine 
replacement and subsequent tag attachment, because these sites represent the sites in the Taq 
polymerase the xmdergo significant changes in position dxiring base incorporation. 

To further refine the amino acid site selection, visualization of flie polymerase in its 
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open and closed conformational extremes for these identified amino acid sites is used so that 
the final selected amino acid sites maximize signal and minimize backgroxmd noise, when 
modified to carry fluorescent tags for analysis using the FRET methodology. Amino acid 
changes that are not predicted to significantly affect the protein*s secondary structure or 
activity make up a refined set of amino acid sites in the Taq polymerase for mutagenesis and 
fluorescent modification so that the tag is shielded firom interaction with free dNTPs. The 
following three panels illustrate the protocol used in this invention to refine amino acid site 
selection fi'om the about list of amino acids that undergo the largest change in position 
relative to a bound ddGTP as the polymerase transitions fix>m the open to the closed form. 

Referring now to Figures 3A-C, an overlay between 3ktq (closed "black*) and Itau 
(open *light blue*), the large firagment of Taq DNA polymerase I is shown. Looking at Figure 
3A, the bound DNA firom 3ktq is shown in red while the ddCTP bound to 3ktq is in green. 
Three residues were visually identified as moving the most when the polymerase goes from 
open (Itau) to closed (3ktq), namely, Asp655, Pro656, and Leu657. Based on further 
analyses of the structures, Pro656 appears to have the role of capping the O-helix. Leu65Ts 
side chain is very close to another part of the protein in the closed (3ktq) form. Addition of 
a larger side chain/tag is thought to diminish the ability of the polymerase to achieve a fully 
closed, active conformation. Conversely, Asp655 is entirely solvent exposed in both the 
closed and open conformations of the polymerase. Looking at Figure 3B, a close-up view of 
the active site firom the overlay of the 3ktq (closed) and Itau (open) conformations of Taq 
polymerase is shown. The large displacements between the open and closed conformations 
are evident Looking at Figure 3C, a close-up view of a molecular surface representation of 
3ktq (in the absence of DNA and ddCTP). The molecular surface is colored in two areas, blue 
for Asp655 and green for Leu657. In this representation, it is evident that Leu657 is in close 
proximity to another part of the protein, because the green part of the molecular surface, in 
the thumb domain, is *'connected" to a part of the fingers domain. This view shows this 
region of the polymerase looking into the palm of the hand with fingers to the right and 
thumb to the left MUTAGENESIS AND SEQUENCING OF POLYMERASE 
VARIANTS 

The gene encoding Taq DNA polymerase was obtained and will be expressed in 
pTTQ ISinjE. co/i strain DHL See,c.g.,Engelkeetal., 1990. The inventors have identified 
candidate amino acids for mutagenesis including ttie amino acids in Tables I-IV, the refined 
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lists or mixtures or combinations thereof. The inventors xising standard molecular methods 
well-known in the art introduced a cysteme codon, individually, at each of target amino acid 
sites. See, e.g. , Sambrook et al., 1989 and Allen et al., 1998. DNA is purified from isolated 
colonies expressing the mutant polymerase, sequenced using dye-terminator fluorescent 
chemistry, detected on an ABI PRISM 377 Automated Sequencer, and analyzed using 
Sequencher™ avaUable from GeneCodes, Inc. 
EXPRESSION AND PURIFICATION OF ENZYME VARIANTS 

The inventors have demonstrated that the Taq polymerase is capable of incorporating 
Y-tagged dNTPs to synthesize extended DNA sequences. The next step involves the 
construction of mutants capable of carrying a tag designed to interact with the tags on the 
dNTPS and optimization of the polymerase for single-molecule sequencing. The mutants are 
constructed using standard site specific mutagenesis as described above and in the 
experimental section. The constructs are then inserted into and expressed in E, coli. Mutant 
Taq polymerase is then obtained after sufiQcient£ coli is grown for subsequence polymerase 
isolation and purification. 

Although E. coli can be grown to optical densities exceeding 100 by 
computer-controlled feedback-based supply of non-fermentative substrates, the resulting three 
kg ofE. coli cell paste will be excessive during polymerase optimization. Of course, when 
optimized polymerases construct are prepared, then this large scale production will be used. 
During the development of optimized polymerases, the mutants are derived from E, coli cell 
masses grown in 10 L well-oxygenated batch cultures usmg a rich medium available from 
Amgen. For fast polymerase mutant screening, the mutants are prepared by growing E. coli 
in 2 L baffled shake glasses. Cell paste are then harvested using a 6 L preparative centrifuge, 
lysed by French press, and cleared of cell debris by centrifiagation. To reduce interference 
from E, coli nucleic acid sequences, it is preferably to also remove other nucleic acids. 
Removal is achieved using either nucleases (and subsequent heat denaturation of the 
nuclease) or, preferably using a variation of flie compaction agent-based nucleic acid 
precipitation protocol as described in Murphy et al.. Nature Biotechnology 17, 822, 1999. 

Because the thermal stability of Taq polymerase is considerably greater than typical 
E, coli proteins, purification of Taq polymerase or its mutants from contaminating Taq 
polynierase protems is achieved by a simple heat treatment of the crude polymerase at 75**C 
for 60 minutes, whichreduces£. ca/i protein contanimation by q)proxiniately 100-fold. This 
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reduction in £. coli protein contamination combined with the high initial expression level, 
produces nearly pure Tag polymerase or its mutants in a convenient initial step; provide, of 
course, that the mutant polymerase retains the thermal stability of the native polymerase. 

For routine sequencing and PCR screening, further limited purification is generally 
required. A single anion-excbange step, typically on Q Sepharose at pH 8.0, is generally 
sufficient to produce a product pure enough to these tests. Preferably, a second purification 
step will also be performed to insure that contamination does not cloud the results of 
subsequent testing. The second purification step involves SDS-PAGE and CD-monitored 
melting experiments. 

SELECTION OF SITE IN dNTP TO ACCEPT FLUORESCENT TAG 

Molecular docking simulations were carried out to predict the docked orientation of 
the natural and fluorescentiy labeled dNTPs using the AutoDock computer program (Morris 
et al., 1998; Soares et al., 1999). Conformational flexibility is permitted during the docking 
simulations making use of an efGcient Lamarckian Genetic algorithm implemented in the 
AutoDock program. A subset of protein side chains is also allowed to move to accommodate 
the dNTP as it docks. The best docked configurations is then energy minimized in the 
presence of a solvent environment. Experimental data are available which identify amino 
acids in the polymerase active site that are involved in catalysis and in contact with the 
template/primer DNA strands or the dNTP to be incorporated. The computer-aided chemical 
modeling such as docking studies can be used identify and support sites in the dNTP that can 
be labeled and to predict the FRET efficiency of dNTPs carrying a specific label at a specific 
site. 

In general, the dNTPs are tagged either by reacting a dNTP with a desired tag or by 
reacting a precursor such as the pyrophosphate group or the base with a desired tag and then 
completing the synthesis of the dNTP. 

Chemical Modification of Nucleotides for DNA Polymerase Reactions 

The inventors have developed syntheses for modifying fluorophore and fluorescence 
energy transfer compounds to have distinct optical properties for differential signal detection, 
for nucleotide/nucleoside synthons for incorporation of modifications on base, sugar or 
phosphate backbone positions, and for producing complementary sets of four deoxynucleotide 
triphosphates (dNTPs) containing substituents on nucleobases, sugar or phosphate backbone. 
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Svatheais of v-Phosphate Modified 

The inventors have found that the native Tag polymerase is capable of polymerizing 
phosphate-modified dNTPs or ddNTPs. Again, tagging the dNPTs or ddNTPs at the beta 
and/or gamma phosphate groups is a preferred because the replicated DNA contains no 
unnatural bases, polymerase activity is not significantly adversely affected and long DNA 
strands are produced. The inventors have syntiiesized y-ANS-phosphate dNTPs, where the 
ANS is attadied to the phosphate through a phosphamide bond. Alfliough tiiese tagged 
dNTPs are readily incorporated by the native Taq polymerase and by HIV reverse 
transcriptase, ANS is only one of a wide range of tags that can be attached through tather tiie 
P and/or y phosphate groiq}s. 

The present invention uses tagged dNTPs or ddNTPs in combination with polymerase 
for signal detection. The dNTPs are modified at phosphate positions (alpha, beta and/or 
gamma) and/or other positions of nucleotides through a covalent bond or affinity association. 
The tags are designed to be removed fi-om the base before the next monomer is added to the 
sequence. One method for removing the tag is to place the tag on the gamma and/or beta 
phosphates. The tag is removed as pyrrophosphate dissociates &om the growing DNA 
sequence. Another method is to attach the tag to a position of on the monomer through a 
cleavable bond. The tag is then removed after incorporation and before the next monomer 
incorporation cleaving the cleavable bond using light, a chemical bond cleaving reagent in 
the polymerization medium, and/or heat. 

One generalized synthetic routine to synthesizing other y-tagged dNTPs is given 



o o To O n^u- 

FR^..C02H+ H0.^0.{j.0X (a)DCC(CH^Cl2i,.i{^o.j{.OX (b)H*rr HF ^jXoXoJ-Os I 

X = counterlonorH ( 9 °^ °* l^H 



> — f II 



rid 



'OH h" 



below: 

where FR is a fluorescent tag, L is a linker grovq), X is either H or a counterioin depending 
on the pH of the reaction medium, Z is a group capable of reactfon with Ihe hydroxyl group 
of the pyrophosphate and Z is group after reaction with Has dNMP. Preferably, Z is CI, Br, 
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I, OH, SH, NH2, NHR, CO2H, CO2R, SiOH, SiOR, GeOH, GeOR, or similar reactive 
ftinctional groups or leaving group, where R is an alkyl, aryl, aralkyl, alkaryl, halogenated 
analogs thereof or hetero atom analogs thereof and T is 0, NH, MR, COj, SiO, GeO, where 
R is an alkyl, aryl, aralkyl, alkaryl, halogenated analogs thereof or hetero atom analogs 
thereof. 

The synthesis involves reacting Z terminated fluorescent tag, FR-L-Z with a 
pyrophosphate group, PjO^sH, in DCC and dichloromethane to produce a fluorescent tagged 
pyrophosphate. After the fluorescent tagged pyrophosphate is prepared, it is reacted vsdth a 
morpholine terminated dNMP in acidic THF to produce a dNTP having a fluorescent tag on 
its Y-phosphate. Because the final reaction bears a fluorescent tag and is larger than starting 
materials, sq)aration from unmodified starting material and tagged pyrophosphate is straight 
forward. 

A generalized synthesis of a the FR-L group is shown below: 







.OCOIb 


^ modified dN or dNTP 


ijy^ (a) , X 


1 r T T 1 1 


J 






^^—^ HO(CH2)8NHC(0)0'^^— ^ ^ 




modified amino adds 


(a) Isobutyryt anhydride 


(b) N-hydrDxyl8UCCinlin)den3CC/CH2Ci2 






dilsoprDpyl emlne/^yridlne 


(c}H0(CH2)6NH2/CH2Cl2 







Fluorescein (FR) is first reacted with isobutyryl anhydride in pyridine in the presence 
of diisopropyl amine to produce a fluorescein having both ring hydroxy groups protected for 
subsequent linker attachment. The hydroxy protected fluorescein is then reacted with N- 
hydroxylsuccinimide in DCC and dichloromethane to produce followed by the addition of 1- 
hydroxy-6-amino hexane to produce an hydroxy terminated FR-L group. This group can then 
be reacted either with pyrophosphate to tag the dNTPs at their y-phosphate group or to tag 
amino acids. See, e.g.. Ward et al., 1987; Engelhardt et al., 1993; Little et al., 2000; Hobbs, 
1991. 

By usmg di£ferent fluorescent tags on each dNTP, tags can be designed so that each 
tag emits a distinguishable emission spectra. The emission spectra can be distinguished by 
producing tags with non-overlapping emission fi»quencies - multicolor - or each tag can 
have a non-overlapping spectral feature such a unique emission band, a unique absorption 
band and/or a unique intensity feature. System that use a distinguishable tag on each dNTP 
improves confidence values associated with the base calling algorithm. 
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The synthetic scheme shown above for fluorescein is adaptable to other dyes as well 
such as tetrachlorofluorescein (JOE) or N,N,N'J>r.tetramethyl-6-carboxyrhodamine 
(TAMRA). Typically, the gamma phosphate tagged reactions are carried out in basic aqueous 
solutions and a carbodiimide, such as DEC. Other fluorophore molecules and dNTPs can be 
similarly modified. 

Synthesis of dNTP Tagged at on the Base 

Although tagging the dNTPs at the beta and/or ganuna phosphate is preferred, the 
dNTPs can also be tagged on the base and/or sugar moieties while mflintflimng their 
polymerase reaction activity. The sites for modifications are preferably selected to not 
interfere with Watson-Crick base pairing. A generalized scheme for base modification is 
shown below: 




Polymerase Activitv As says Using a Fluorescentlv-tapged Enzvme 

The activities of polymerase variants are monitored throughout polymerase 
development. Polymerase activity is assayed after a candidate amino acid is mutated to 
cysteme and after fluorescent tagging of the cysteine. The assay used to monitor the ability 
of flie native Tag polymerase to incorporate fluorescently-tagged dNTPs is also used to screen 
polymerase variants. Smce the mutant Taq polymerases have altered amino acid sequences, 
the assays provide mutant characterization data such as thermostability, fidelity, 
polymerization rate, affinity for modified versus natural bases. 

Mutant Taq polymerase activity assays are carried out under conditions similar to 
those used to examine the incorporation of fluorescently-tagged dNTPs into DNA polymers 
by the native Taq polymerase. To examine mutant Taq polymerase activity, the purified 
mutant Taq polymerase is incubated in polymerase reaction buffer with a 5*-^^P end-labeled 
primer/single-stranded template duplex, and appropriate tagged dNTP(s). The polymerase's 
ability to incorporate a fluorescently-tagged dNTP is monitored by assaying flie relative 
amount of fluorescence associated with flie extended primer on eitiier an ABD77 DNA 
Sequencer (for fluorescentiy tagged bases), a Fuji BASIOOO phosphorimaging system, or 
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ether appropriate or similar detectors or detection systems. This assay is used to confirm that 
the mutant polymerase incorporates tagged dNTP and to confirm that fluorescent signatures 
are obtained during base incorporation. These assays use an end-labeled primer, the 
fluorescently-tagged dNTP and the appropriate base beyond the fluorescent tag. The products 
are then size separated and analyzed for extension. Reactions are either performed under 
constant temperature reaction conditions or thermocycied, as necessary. 
Primer Extension Assays 

The ability of Tag DNA polymerase to incorporate a y-phosphate dNTP variant is 
assayed using conditions similar to those developed to examine single base incorporation by 
a fluorescently-tagged DNA polymerase. See, e.g. Furey et al, 1998. These experiments 
demonstrate that polymerases bearing a fluorescent tag do not a priori have reduced 
polymerization activity. The inventors have demonstrated that the native Taq polymerase 
incorporates y-tagged dNTP, singly or collectively to produce long DNA chains. 

To examine polymerase activity, the polymerase is incubated in polymerase reaction 
buffer such as Taq DNA polymerase buffer available from Promega Corporation of Madison, 
Wisconsin with either a 5 -^^P or a fluorescently end-labeled primer (TOPysingle-stranded 
template (BOT-'X') duplex, and appropriate dNTP(s) as shown in Table V. Reactions are 
carried out either at constant temperature or thermocycled, as desired or as is necessary. 
Reaction products are then size-separated and quantified using a phosphorimaging or 
fluorescent detection system. The relative eflBciency of incorporation for each tagged dNTP 
is determined by comparison with its natural counterpart 



TABLE V 



Primer Strand: 

TOP 5' 
Template Strands: 



GGT ACT AAG CGG CCG CAT G 3' 



BOT-T 3' 
BOT-C 3' 
BOT-G 3 ' 
BOT-A 3' 
B0T-3T 3' 



CCA TGA TTC GCC GGC GTA CTC 5*- 
CCA TGA TTC GCC GGC GTA CCC 5' 
CCA TGA TTC GCC GGC GTA CGC 5' 
CCA TGA TTC GCC GGC GTA CAC 5' 



CCA TGA TTC GCC GGC GTA CTT TC 5' 
CCA TGA TTC GCC GGC GTA CCT AG 5' 



BOT-Sau 3' 



In Table V, TOP' represents the primer strand of an assay DNA diQ)lex. Variants of llie 
template strand are represented by BOT. The relevant feature of the DNA template is 
indicated after the hyphen. For example, BOT-T, BOT-C, BOT-G, BOT-A are used to 
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monitor polymerase incorporation efficiency and fidelity for either nucleotides or nucleotide 
variants of dA, dG, dC, and dT, respectively. 

Preliminary assays are performed prior to exhaustive purification of the tagged dNTP 
to ensure that the polymerase is not inhibited by a chemical that co-purifies with the tagged 
dNTP, using the 'BOT-Sau' template. The BOT-Sau' template was designed to monitor 
incorporation of natural dGTP prior to tagged dATP (/.«., a positive control for polymerase 
activity). More extensive purification is then performed for promising tagged nucleotides. 
Similarly, experiments are carried out to determine whether the polymerase continues 
extension following incorporation of the tagged dNTPs, individually or collectively, using 
the same end-labeled TOP primer, the appropriate 'BOP primer, the fluorescentiy-tagged 
dNTP, and the appropriate base 3' of the tagged nucleotide. The products are tiien size- 
separated and analyzed to determine the relative extension efficiency. 
Assay FideUtv of v-nlift«nh« te Tagged NiiHeotide Tnt^orp ^^^^« 

The T aq DNA polymerase lacks 3' to 5' exonuclease activity (proofreading activity). 
If flie polymerase used in smgle-molecule DNA sequencing possessed a 3' to 5' exonuclease 
activity, the polymerase would be capsble of adding anotiier base to replace one that would 
be removed by tiie proofiieading activity. This newly added base would produce a signature 
fluorescent signal evidencing tiie incorporation of an additional base in the template, resulting 
in a misidentified DNA sequence, a situation tiiat would render the single-molecule 
sequencing systems of this invention problematic. 

If the error rate for the incorporation of modified dNTPs exceeds a threshold level of 
about 1 error in 100, tiie sequencing reactions are preferably run in paraUel, witii the optimal 
number required to produce sequence information with a high degree of confidence for each 
base call determined by tiie error rate. Larger error rates require more parallel run, while 
smaUer error rates require fewer parallel runs. In fact, if tiie error rate is low enough, 
generally less tiian 1 error in 1,000, preferably 1 error in 5,000 and particularly 1 error in 
10,000 incorporated base, tiien no parallel runs are required Insertions or deletions arc, 
potentiaUy, more serious types of errors and warrant a minimal redundancy of 3 repeats per 
sample. If 2 reactions were run, one could not be certain which was correct Hius, 3 
reactions are needed for tiie high quality data produced by this qrstem. 

The BOT-vaiiant tempUtes are used to characterize the accuracy at which each y- 
tagged dNTP is incorporated by an engineered polymerase as set forth in Table V. 
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Oligonucleotides serve as DN A templates, and each differing only in the identity of the first 
base incorporated. Experiments using these templates are used to examine the relative 
mcorporation efficiency of each base and the ability of the polymerase to discriminate 
between the tagged dNTPs. Initially, experiments with polymerase variants are carried out 
using relatively simple-sequence, single-stranded DN A templates. A wide array of sequence- 
characterized templates is available from the University of Houston in Dr. Hardin's 
laboratory, including a resource of over 300 purified templates. For example, one series of 
templates contains variable length polyA or polyT sequences. Additional defined-sequence 
templates are constructed as necessary, fecilitating the development of the base-calling 
algorithms. 

Relative Fluorescence Intensity Assays 

Direct detection of polymerase action on the tagged dNTP is obtained by solution 
fluorescence measurements, using SPEX 212 instrument or similar instrument. This 
instrument was used to successfully detect fluorescent signals firom ANS tagged y-phosphate 
dNTPs, being incorporated by Taq polymerase at nanomolar concentration levels. The SPEX 
212 instrument includes a 450 watt xenon arc source, dual emission and dual excitation 
monochromators, cooled PMT (recently upgraded to sunultaneous T-format anisotropy data 
collection), and a Hi-Tech stopped-flow accessory. This instrument is capable of detecting 
an increase in fluorescence intensity and/or change in absorption spectra upon liberation of 
the tagged pyrophosphate from ANS tagged y-phosphate dNTPs, as was verified for ANS- 
pyrophosphate released by Taq and RNA polymerase and venom phosphodiesterase. 

Experiments have been and are being performed by incubating y-phosphate tagged 
dATP or TTP (Control: non-modified dATP and TTP) in an appropriate buffer (e.g., buffers 
available firom Promega Corporation) in the presence of polymerase (Control: no enzyme) 
and DNA primer/template [poly(dA). poly(dT)] (Control: no primer/template DNA). When 
the polymerase incorporates a tagged dNTP, changes in fluorescence intensity and/or 
fiiequency, absorption and/or emission spectra, and DNA polymer concentration are detected 
Changes in these measurables as a fiinction of time and/or temperature for experimental 
versus control cuvettes allows for unambiguous determination of whether a polymerase is 
incorporating the y-phosphate tagged dNTP. Excitation and fluorescence emission can be 
optiniized for each tagged dNTP based on changes in these measurables. 
Deyelopment of a Single-Molecule Detection System 
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The detection of fluorescence from single molecules is preferably carried out using 
microscopy. Confocal-scanning microscopy can be used in this application, but a non- 
scanning approach is preferred. An microscope useful for detecting fluorescent signals due 
to polymerase activity include any type of microscope, with oil-immersion type microscopes 
being preferred. The microscopes are preferably located in an environment in which 
vibration and temperature variations are controlled, and fitted with a highly-sensitive digital 
camera. While many different cameras can be to record the fluorescent signals, the preferred 
cameras are intensified CCD type cameras such as the iPentaMax fiom Princeton 
Instruments. 

The method of detection involves illuminating the sanq)les at wavelengths sufficient 
to induce fluorescence of the tags, preferably in an internal-reflection format. If the 
fluorescent tags are a donor-acceptor pair, then the excitation frequency must be sufficient 
to excite the donor. Although any type of light source can be used, the preferred Ught source 
is a lasCT. It will often be advantageous to image the same sample in multiple fluorescence 
emission wavelengths, either inrq)id succession or simultaneously. For simultaneous multi- 
color unaging, an image splitter is preferred to allow the same CCD to collect all of the color 
images simultaneously. Alternatively, multiple cameras can be used, each viewing the 
sample through emission optical filters of different vravelength specificity. 

Tag detection in practice, of course, depends upon many variables including the 
specific tag used as well electrical, fluorescent, chemical, physical, electrochemical, mass 
isotope, or other properties. Single-molecule fluorescence imaging is obtainable employing 
a research-grade Nikon Diaphot TMD inverted epifluorescence microscope, upgraded with 
laser illumination and a more-sensitive camera. Moreover, single-molecule technology is a 
well-developed and commercially available technology. See, e.g.,¥ecketal., 1989; Ambrose 
etaL, 1994; Goodwin et al., 1997; Brouwer W al, 1999; Castro and WilUams, 1997; Davis 
et al., 1991; Davis et al., 1992; Goodwin et al, 1997; KeUer et al., 1996; MichaeUs et al, 
2000; Orrit and Bernard, 1990; Orrit et al., 1994; Sauer et al., 1999; Unger et al., 1999; 
Zhuanggfa/.,2000. 

The epifluorescence microscope can be retrofitted for evanescent-wave excitation 
using an argon ion laser at 488 nm. The inventors have previously used (his illumuwtion 
geometry in assays for nucleic acid hybridization studies. The existing setup has also been 
upgraded by replacement of the current CCD camera with a 1 2-bit 5 1 2 x 5 1 2 pixel Princeton 
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Instruments I-PentaMAX generation IV intensified CCD camera, which has been used 
successfully in a variety of similar single-molecule applications. This camera achieves a 
quantum efficiency of over 45% in the entire range of emission wavelengths of the dyes to 
be used, and considerably beyond this range. The vertical alignment of their existing 
microscope tends to minimize vibration problems, and the instrument is currentiy mounted 
on an anti-vibration table. 

A preferred high-sensitivity imaging system is based on an Olympus IX70-S8F 
inverted epifluorescence microscope. The system incorporates low-background components 
and enables capture of single molecule fluorescence images at rates of greater than 80 fi:ames 
per second with quantum efiBciency between 60 - 70% in the range of emission wavelengths 
of the fluorescentiy active tags. 

In imaging the fluorescence of multiple single molecules, it is preferable to minimize 
the occurrence of multiple fluorescent emitters within a data collection channel such as a 
single pixel or pixel-bin of the viewing field of the CCD or other digital imaging system. A 
finite number of data collection channels such as pixels are available in any given digital 
imaging apparatus. Randomly-spaced, densely-positioned fluorescent emitters generally 
produce an increased firaction of pixels or pixel bins that are multiply-occupied and 
problematic in data analysis. As the density of emitters in the viewing field increases so does 
the number of problematic data channels. While midtiple occupancy of distinguishable data 
collection regions within the viewing field can be reduced by reducing the concentration of 
emitters in the viewing field, this decrease in concentration of emitters increases the firaction 
of data collection channels or pixels that see no emitter at all, therefore, leading to inefScient 
data collection. 

A preferred method for increasing and/or maximizing the data collection efficiency 
involves controlling the spacing between emitters (tagged polymerase molecules). This 
spacing is achieved in a number of ways. First, the polymerases can be immobilized on a 
substrate so that only a single polymerase is localized witiiin each data collection channel or 
pixel region within the viewing field of the imaging system. The immobilization is 
accomplished by anchoring a capture agent or linking group chemically attached to the 
substrate. Capture or linking agents can be spaced to useful distances by choosing inherently 
large cq}ture agents, by conjugating them with or bonding them to molecules which enhance 
their steric bulk or electrostatic repulsion bulk, or by immobilizing under conditions chosen 
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to maximize repulsion between polymeriang molecular assembly (e.g., low ionic strength 
to maximize electrostatic repulsion). 

Alternatively, the polymerase can be associated with associated proteins that increase 
the steric bulk of the polymerase or the electrostatic repulsion bulk of the polymerizing 
system so that each polymerizing molecular assembly cannot JCTroach any closer than a 
distance greater than the data channel resolution size of the imaging system. 
fqfmffrm A^YitV AWftVg Using « Single-M olecule net^an Svat^ m 

These assays are performed essentially as described in for polymerase activity assays 
described herein. As stated above, the primary diffraence between assayii^ polymerase 
activity for soreening purposes involves the immobilization of some part of the polymerizing 
assembly such as the polymerase, target DNA or a primer associated protdn to a solid siqjport 
to enable viewing of individual replication events. A variety of immobilization options are 
available, including, without limitation, covalent and/or non-covalent attachment of one of 
flie molecular assemblies on a surface such as an organic surface, an inorganic surface, in or 
on a nanotubes or otiier shnilar nano-structures and/or m or on porous matrices. These 
immobilization techniques are designed to provide specific areas for detection of the 
detectable property such as fluorescent, NMR, or the like, where the spacing is sufficient to 
decrease or minimize data collection channels having multiple emitters. TTius, a preferred 
data collection method for single-molecule sequencing is to ensure that the fluorescently 
tagged polymerases are spaced apart within the viewing field of the imagining apparatus so 
that each data collection channel sees the activity of only a single polymerase. 
Analysis of Fluorescen t Signals from Single-molecule Sequencing System 

The raw data generated by the detector represents between one to four time-dependent 
fluorescence data streams comprising wavelengths and intensities: one data stream for each 
fluorescently labeled base being monitored. Assignment of base identities and reliabilities 
are calculated using the PHRED computer program. If needed, the inventors win write 
conq)uter programs to interpret the data streams having partial and overlapping data In such 
cases, multiple experiments are run so ftat confidence limits are assigned to each base 
idMitity according to the variation in the reliability mdices and fbe difficulties associated with 
assembling stretches of sequence 6om fragments. The reliability indices represent fbs 
goodness of the fit between the observed wavelengths and intensities of fluorescence 
compared with ideal values. The result of the signal analyses is a Imear DNA sequence with 
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associated probabilities of certainty. Additionally, when required, the data is stored in a 
database for dynamic querying for identification and comparison purposes. A search term 
(sequence) of 6- 10, 11-16, 17-20, 2 1-30 bases can be compared against reference sequences 
to quickly identify perfectly matched sequences or those sharing a user defined level of 
identity. Multiple experiments are run so that confidence limits can be assigned to each base 
identity according to the variation m the reliability indices and the difiGculties associated with 
assembling stretches of sequence from fi-agments. The reliability indices represent the 
goodness of the fit between the observed wavelengths and intensities of fluorescence 
compared with the ideal values. The result of the signal analyses is a linear DNA sequence 
with associated probabilities of certainty. 

INFORMATICS: ANALYSIS OF FLUORESCENT SIGNALS FROM THE SINGLE- 
MOLECULE DETECTION SYSTEM 

Data collection allows data to be assembled &om partial information to obtain 

sequence information fiom multiple polymerase molecules in order to determine the overall 

sequence of the template or target molecule. An important driving force for convolving 

together results obtained with multiple single-molecules is the impossibility of obtaining data 

firom a single molecule over an indefinite period of time. At a typical dye photobleaching 

efficiency of 2* 1 0'^, a ^ical dye molecule is expected to undergo 50,000 excitation/emission 

cycles before permanent photobleaching. Data collection fix)m a given molecule may also 

be interrupted by intersystem crossing to an optically inactive (on the time scales of interest) 

triplet state. Even with precautions against photobleaching, therefore, data obtained fi:om any 

given molecule is necessarily fi:agmentary for template sequences of substantial length, and 

these subsequences are co-processed in order to derive the overall sequence of a target DNA 

molecule. 

Additionally, in certain instances it is usefiil to perform reactions with reference 
controls, similar to microarray assays. Comparison of signal(s) between the reference 
sequence and the test sample are used to identify diflferences and similarities in sequences or 
sequence composition. Such reactions can be used for fast screening of DNA polymers to 
determine degrees of homololgy between the polymers, to determine polymorphisms in DNA 
polymers, or to identity pathogens. 

EXAMPLES 
Cloning and Mutagenesis of Taa Polymerase 
Cloning 
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Bacteriophage lambda host strain Charon 35 harboring the full-length of the Thermus 
aquaticus gene encoding DNA polymerase I (Tag pol I) was obtained fiom the American 
Type Culture Collection (ATCC; Manassas, VA). Tag pol I was amplified directly from the 
lysate of the infected £. coli host using the following DNA oligonucleotide primos: 
Tag Pol I forward f 
S'-gc gaattc atgaggggga tgctgcccct ctttgBgccc-3' 
Tag Pol I Kv&cx 

S'-gc gaatto accctccttgg cggagcgc cagtcctccc-3' 

The underUned segment of each synthetic DNA oligonucleotide rqwesents engineered 
EcoRI restriction sites immediately preceding and following the Tag pol I gene. PGR 
amplification using the reverse primer described above and the following forward primer 
(seated an additional construct with an N-terminal deletion of tiie gene: 
r(a^PolI_A293_trunk 

S'-aatccatgggccctggaggaggc cccctggcccccgc-3' 

The underlined segment corresponds to an engineered Ncol restriction site with the 
first codon aicoding for an alanine (ATG start representing an expression vector following 
the ribosome binding site). Ideally, the full-length and truncated constructs of the Tag pol I 
gene is Ugated to a single EcoRI site (fuU-lengfli) and in an NcoI/EcoRI digested pRSET-b 
expression vector. E. coli strain JM109 is used as host for aU in vivo manipulation of the 
engineered vectors. 
Mutagensis 

Once a suitable construct is generated, individual cysteine mutations are introduced 
at preferred amino acid positions including positions 513-518, 643, 647, 649 and 653-661 of 
the native Tag polymerase. The following amino acid residues correspond to the amino acids 
between amino acid 643 and 661, where xxx rejaesents intervening amino acid residues in 
the native polymerase: 

643-Alaxxx xxx xxx Phexxx Val xxx xxx (Hu Ala Val Asp Pro LeuMet Arg Aig Ala -661 
Overlapping primers are used to introduce point mutations mto the native g^e by 
PGR based mutagenesis (using Pfu DNA polymerase). 

Complemaitary forward and reverse primers each contam a codon that encodes the 
desired mutated amino add residue. PGR using these primers results in a knicked, non- 
methylated, double-stranded plasmid containmg the desired mutation. To remove the 
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template DNA, the entire PGR product is treated with Dpnl restriction enzyme (cuts at 
methylated guanosines in the sequence GATC). Following digestion of the template plasmid, 
the mutated plasmid is transformed and ligation occurs in vivo. 

The following synthetic DNA oligonucleotide primers are used for mutagenesis as 
described below, where the letters designated in lowercase have been modified to yield the 
desired Cysteine substitution at the indicated position. Mutants are thm screened via 
automated sequencing. 

Alanine 643 to Cysteine Replacement 
Tag Pol I_Ala643Cys_fwd 

5*-C CAC ACQ GAG AGO tgC AGC TGG ATG TTC GGC G-3' 
Tag Pol I_Ala643Cys_rev 

5'-C GGC GAA CAT CCA CGA Gca GGT CTC CGT GTG G-3' 

Phenylalanine 647 to Cysteine Rq)lacement 
Tag Pol I_Phe647Cys_fwd 

5'-CC GCC AGC TGG ATG TgC GGC GTC CCC CGG GAG GCC-3' 
Tag Pol I_Phe647Cys_rev 

5'-GGC CTC CCG GGG GAC GCC GcA CAT CCA CGT GGC GG-3' 

Valine 649 to Cysteine Rq)lacmeat 

Tag Pol I_Val649Cys_fwd 

5'-GCC AGC TGG ATG TTC GGC tgC CCC CGG GAG GCC GTG G-3' 
Tag Pol I_Val649Cys_rev 

5'-C CAC GGC CTC CCG GGG Gca GCC GAA CAT CCA GCT GGC-3' 

Glutamic Acid 652 to Cysteine Replacement 
Tag Pol I_Glu652Cys_fwd 

5'-GGC GTC CCC CGG tgc GCC GTG GAC CCC CTG ATG CGC-3' 
Tag PoII_Glu652Cys_rev 

5'-GCG CAT CAG GGG GTC CAC GGC gca CCG GGG GAC GCC-3' 

Alanine 653 to Cysteine Replacement 
Tag Pol I_Ala653Cys_fwd 

5'.<K3C GTC CCC CGG GAG tgC GTG GAC CCC CTG ATG CGC-3' 
Tag Pol IAla653Cys_iev 

5'-GCG CAT CAG GGG GTC CAC Gca CTC CCG GGG GAC GCC-3' 
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Valine 654 to Cysteine Replacement 

Tag Pol I_Val654Cys_fwd 

5'-GTC CCC CGG GAG GCC tgt GAC CCC CTG ATG CGC-3' 
Tag PolI_Val654Cys_rev 

5'-GCG CAT CAG GGG GTC aca GGC CTC CCG GGG GAC-3' 
Aspartic Acid 6SS to Cysteine Replacement 

ra^ PolI_D655C_fwd 

5'-CCC CGG GAG GCC GTG tgC CCC CTG ATG CGC CGG-3' 
ragPolI_D655C_rev 

5'-CCG GCG CAT CAG GGG Gca CAC GGC CTC CCG GGG-3' 

Proline 656 to Cysteine Replacement 

Tag Pol I_Pro656Cys_fwd 

5'-CGG GAG GCC GTG GAC tgC CTG ATG CGC CGG GCG-3' 
Tag Pol I_Pro656Cys_rev 

5'-CGC CCG GCG CAT CAG Gca GTC CAC GGC CTC CCG-3' 

Leucine 657 to Cysteine Replacement 
Tag Pol I_Leu657Cys_fwd 

5'-GCC GTG GAC CCC tgc ATG CGC CGG GCG GCC-3' 
Tag Pol I_Leu657Cys_rev 

5'-GGC CGC CCG GCG CAT gca GGG GTC CAC GGC-3' 

Methionine 658 to Cysteine Replacement 
Tag Pol I_Met658Cys_^d 

5'-GCC GTG GAC CCC CTG tgt CGC CGG GCG GCC-3' 
Tag Pol I_Met658Cys_rev 

5'-GGC CGC CCG GCG aca CAG GGG GTC CAC GGC-3' 

Arginine 659 to Cysteine Rq)lacement 

Tag Pol I_Aig659Cys_fwd 

5'-GCC GTG GAC CCC CTG ATG tGC CGG GCG GCC AAG ACC-3' 
Tag Pol I_Arg659Cys_rev 

5'-GGT CTT GGC CGC CCG GCa CAT CAG GGG GTC CAC GGC-3' 

Arginine 660 to Cysteine Replacemrat 
Tag Pol I_Arg660Cys_fwd 
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5'-GAC CCC CTG ATG CGC tOc GCG GCC AAG ACC ATC-3* 
Tag Pol LArg660Cys_rev 

5*-GAT GGT CTT GGC CGC gCa GCG CAT CAG GGG GTC-3' 

Alanine 661 to Cysteine Replacement 
Tag Pol LAla661Cys_fwd 

5*-CCC CTG ATG CGC CGG tgc GCC AAG ACC ATC AAC-3' 
Tag Pol I_Ala661Cys_rev 

5*-GTT GAT GGT CTT GGC gca CCG GCG CAT CAG GGG-3' 

The resulting mutant Tag polymerases are then reacted ^th a desired atomic or 
molecular tag to tag the cysteine in the mutant structure through the SH group of the cysteine 
residue and screened for native and/or tagged dNTP incorporation and incorporation 
efGciency. The mutant polymerases are also screened for fluorescent activity during base 
incorporation. Thus, the present invention also relates to mutant Taq polymerase having a 
cysteine residue added one or more of the sites selected from the group consisting of 5 1 3-S 1 8, 
643, 647, 649 and 653-661, After cysteine replacement and verification of polymerase 
activity using the modified dNTPs, the mutant Taq polymerases are reacted with a tag 
through the SH group of the inserted cysteine residue. 
Synthesis of Modified dNTPs 
Synthesis of (y-AmNS)dATP 

Nucleotide analogues which contain fluorophore l-aminonaphalene-5-sulfonate 
attached to the y-phosphate bond were synthesized {X Biol Chem,254J 2069-1 2073, 1979). 
dATP analog - (y-AmNS)dATP was synthesized according to the procedures slightly altered 
from what was described by Yarbrough and co-workers for (Y-AmNS)ATP with some 
modifications. 

This example illustrates the preparation of gamma ANS tagged dATP, shown 
graphically in Figure 4. 

l-Aminonaphthalene-5-sulphonic acid (447 mg, 2 mmol, 40 eq., fix)m Lancaster) was 
added to 10 mL of HjO, and the pH was adjusted to 5,8 with 1 N NaOH. The insoluble 
material was removed by syringe filter, yielding a solution vMch was essentially saturated 
for fiiis pH value (-0.18 to 0.2 M). 4 mL of 12.5 mM 5'triphosphate-2'-deoxyadenosine 
disodium salt (0.05 mmol, 1 eq.) and 2 mL of 1 M l-(3-dimetfaylaminopropyl)-3-efliyl- 
carbodiimide hydrochloride (DEC, 2 mmol, 40eq., from Lancaster) were added to a reaction 
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vessel at 22 °C. The reaction was initiated by adding 10 mL of the l-aminonaphthalene-5- 
sulfonate solution and allowed to continue for 2.5 h. The pH was kept between 5.65 - 5.75 
by periodic addition of 0.1 N HCI. After 2.5 h, the reaction was diluted to 50 mL and 
adjusted to a solution of 0.05 M triethylammonium bicarbonate buffer (TEAB, pH -7.5). The 

reactionproductwaschromatographedona50mLDEAE-SEPHADEXionexchanger(A-25- 
120) column at low temperature that was equilibrated with -^H 7.5 1 .0 M aqueous TEAB 
(100 mL), 1 .0 M aqueous sodium bicarbonate (100 mL), and~pH 7.5, 0.05 M aqueous TEAB 
(1 00 mL). The colunm was eluted with a linear gradient of ~pH 7.5 aqueous TEAB from 0.05 
to 0.9 M. Approximately 10 mL fractions were coUected. Absorbance and fluorescence 
profiles (UV 366nm) of the fractions were obtained after appropriate dilution. The 
fluorescent fraction eluted at -0.7 M buffer after the peak of the unreacted dATP and showed 
a brilliant blue fluorescence. The product-containing fractions were pooled, dried by 
lyophilizer and co-evaporated twice with HiO/ethanol (70/30). The residue was taken up in 
water and lyophilized. ^'P NMR ('H decoupled, 600 MHz, DjO, Me3P04 external reference, 
293 K, pH 6.1) 6 (ppm) -12.60, -14.10, -25.79. The reference compound dATP gave the 
following resonance peaks: ''P NMR (dATP, Na*) in DjO 293 K, 8 (ppm) -1 1 .53 (y), -13.92 
(a), -24.93 (P). 

Using diode array UV detection HPLC, the fiaction containing the desired product 
was easUy identified by tiie distinct absorption of the ANS group at 366 nm. Additionally, "P 
NMR spectra were recorded for the y-phosphate tagged dATP and regular dATP in an 
aqueous solution. For each compound, three characteristic resonances were observed, 
confinningttietriphosphatemoiety inthey-taggeddATP. The combined analyses - 'H-NMR, 
HPLC, and UV spectra - provide supporting information for tiie formation of the correct 
compound. 

The same syntiietic procedure was used to prepare y-ANS-phosphate modified dGTP, 
dTTP anddCTP, 

Y-PhosDhate-taggeJ il NTP Incorporation Bv Tag Polvnierase 

The foUowing examples iUustrate that conunercially available Tag DNA polymerase 
efficiently incorporates the ANS-y-phosphate dNTPs, the syntheses and characterization as 
described above. 

In the first example, illustrates the incorporation of ANS-y-phosphate dATP to 
produce extended DNA products from primer tanplates. The reactions were carried out in 
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extension buffer and the resulting Radiolabeled products were size separated on a 20% 
denaturing polyacryamide gel Data was collected using a phosphorimaging system. 
Referring now the Figure 5, Lane 1 contains 5* radiolabeled TOP probe in extension buffer. 
Lane 2 contains Tag DNA polymerase, 50 \jM dGTP incubated with a DNA duplex 
(radiolabeled TOP with excess 'BOT-Sau'). Lane 3 contains Tag DNA polymerase, 50 ^M 
dATP incubated with a DNA duplex (radiolabeled TOP with excess 'BOT-Sau'). Lane 4 
contains Tag DNA polymerase, 50 jiM ANS-y-dATP incubated with a DNA duplex 
(radiolabeled TOP with excess 'BOT-Sau*). Lane 5 contains Tag DNA polymerase, 50 nM 
dGTP incubated with a DNA duplex (radiolabeled TOP with excess BOT-T). Lane 6 
contains spill-over from lane 5. Lane 7 contains Tag DNA polymerase, 50 \iM dATP 
incubated with a DNA duplex (radiolabeled TOP with excess •BOT-T), Lane 8 contams Tag 
DNA polymerase, 50 jiM ANS-y-dATP incubated with a DNA duplex (radiolabeled TOP 
with excess 'BOT-P). Lane 9 contains Tag DNA polymerase, 50 dGTP mcubated with 
a DNA duplex (radiolabeled TOP with excess 'BOT-ST). Lane 10 contains Tag DNA 
polymerase, 50 \M dATP incubated with a DNA duplex (radiolabeled TOP with excess 
*B0T-3r). Lane 11 contains Tag DNA polymerase, ANS-y-dATP incubated with a DNA 
duplex (radiolabeled TOP with excess B0T-3T). Lane 12 contains 5' radiolabeled TOP 
probe in extension buffer. Lane 13 contains 5* radiolabeled TOP probe and Tag DNA 
polymerase in extension buffer. Oligonucleotide sequences are shown in Table V. 

Quantitative comparison of lane 1 with lane 4 demonstrates that very little non- 
specific, single-base extension was detected when ANS-y -dATP was included in the reaction, 
but the first incorporated base shovild be dGTP (which was not added to the reaction). 
Quantitative analysis of lanes 1 and 8 demonstrates that approximately 71% of the TOP 
primer are extended by a template-directed single base when ANS-y-dATP was included in 
the reaction and the first incorporated base should be dATP. Thus, Tag DNA polymerase 
incorporates y-^gg^ nucleotides. Equally important to the polymerase's ability to 
incorporate a y-tagged nucleotide is its ability to extend the DNA polymer after the modified 
dATP was incorporated. Comparison of lane 1 with lane 1 1 demonstrated that a DNA strand 
was extended after ay-tagged nucleotide was incorporated. Thus, incorporation of a modified 
nucleotide was not detrimental to polymerase activity. Note, too, that extension of the primer 
strand by incorporation of an ANS-y-nucleotide depended upon Watson-Crick base-pairing 
rules. In &ct, the fidelity of nucleotide incorporation was increased at least IS-fold by the 
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addition of this tag to the y-phosphate. 

This next example illustrates the synthesis of extended DNA polymers using all four 
ANS tagged y-phosphate dNTPs. Products generated in these reactions were separated on 
a 20% denaturing polyacrylamide gel, the gel was dried and imaged following overnight 
exposure to a Fuji BASIOOO imaging plate. Referring now to Figure 6, an image of (A) the 
actual gel, (B) a lightened phosphorimage and (C) an enhanced phosphorimage. Lane 
descriptions for A, B, and C follow; Lane 1 is the control containing purified 10-base primer 
extended to 11 and 12 bases by template-mediated addition of alpha-^^P dCTP. Lane 2 
includes the same primer that was incubated with double-stranded plasmid DNA at 96°C for 
3 minutes (to denature template), the reaction was brought to 3TC (to anneal 
primer-template). Tag DNA polymerase and all four natural dNTPs (100 uM, each) were 
added and the reaction was incubated at 37*^0 for 60 minutes. Lane 3 includes the same 
labeled primer that was incubated with double-stranded DNA plasmid at 96*^0 for 3 minutes, 
the reaction was DNA polymerase and all four gamma-modified dNTPs (1 00 uM, each) were 
added and the reaction was incubated at 37°C for 60 minutes. Lane 4 includes the control, 
purified 10-base primer that was extended to 11 and 12 bases by the addition of 
alpha-^^-dCTP was cycled in parallel with lanes 5-8 reactions. Lane 5 includes the same 
^^P-labeled primer that was incubated with double-stranded plasmid DNA at 96°C for 3 
minutes, the reaction was brought to 37°C for 10 minutes, during which time Tag DNA 
polymerase and all four natural dNTPs (100 uM, each) were added. The reaction was cycled 
25 times at 96*^C for 10 seconds, 37°C for 1 minute, and 70°C for 5 minutes. Lane 6 includes 
the same ^^P-labeled primer that was incubated with double-stranded plasmid DNA at 96°C 
for 3 mmutes, the reaction was brought to 37°C for 10 minutes, during which tune Tag DNA 
polymerase and all four gamma-modified dNTPs (100 uM, each) were added. The reaction 
was cycled 25 times at 96°C for 10 seconds, 37°C for 1 minute, and 70°C for 5 minutes. Lane 
7 includes nonpurified, 1 0-base, ^^P-labeled primer that was mcubated with double-stranded 
DNA plasnud at 96^*0 for3 minutes, the reaction was brought to 37**C for 10 minutes, during 
which time Tag DNA polymerase and all four natural dNTPs (100 uM, each) were added. 
The reaction was cycled 25 times at 96°C for 10 seconds, 37**C for 1 minute, and 70**C for 
5 minutes. Lane 8 includes nonpurified, 10-base, ^^-labeledprimerthat was incubated with 
double-stranded DNA plasmid at 96°C for 3 mmutes, the reaction was brought to 37*^0 for 
10 minutes, during which tune Tag DNA polymerase and all four gamma-modified dNTPs 
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were added. The reaction was cycled 25 times at 96''C for 10 seconds, 3TC for 1 minute, and 
70*^0 for 5 minutes. Evident in the reactions involving tagged dNTPs is a substantial decrease 
in pyrophosphorolysis as compared to reactions involving natural nucleotides. 

This next example illustrates the synthesis of long DN A polymers using all four ANS 
tagged Y-phosphate dNTPs. Each primer extension reaction was split into two fractions, and 
one fraction was electrophoresed through a 20% denaturing gel (as described above), while 
the other was electrophoresed through a 6% denaturing gel to better estimate product lengths. 
The gel was dried and imaged (overnight) to a Fuji BASIOOO imaging plate. Referring now 
to Figure 7, an unage of (A) the actual gel, (B) a lightened phosphorimage of the actual gel, 
and (C) an enhanced phosphorimage of the actual gel. Lane descriptions for A, B, and C 
follow: Lane 1 includes 123 Marker with size standards indicated at the left of each panel. 
Lane 2 contains the control, purified lO-base primer extended to 11 and 12 bases by 
template-mediated addition of alpha-^^P dCTP. Lane 3 contains the same -labeled primer 
that was incubated with double-stranded plasmid DNA at 96°C for 3 minutes (to denature 
template), the reaction was brought to 37''C (to anneal primer-template), Tag DNA 
polymerase and all four natural dNTPs (100 uM, each) were added and flie reaction was 
incubated at 37°C for 60 minutes. Lane 4 includes the same -labeled primer that was 
incubated with double-stranded DNA plasmid at 96°C for 3 minutes, the reaction was brought 
to 37*^C, Tag DNA polymerase and all four gamma-modified dNTPs (100 uM, each) were 
added and the reaction was incubated at 37°C for 60 minutes. Lane 5 includes the control, 
purified 10-base primer that was extended to 1 1 and 12 bases by the addition of alpha-^^P - 
dCTP was cycled in parallel with lanes 5-8 reactions. Lane 6 includes the same ^^P -labeled 
primer that was incubated with double-stranded plasmid DNA at 96°C for 3 minutes, the 
reaction was brov^t to 37°C for 10 minutes, during which time Taq DNA polymerase and 
all four natural dNTPs (1 00 uM, each) were added. The reaction was cycled 25 times at 96®C 
for 10 seconds, 37°C for 1 mmute, and 70®C for 5 minutes. Lane 7 includes the same ^^P - 
labeled primer that was incubated with double-stranded plasmid DNA at 96°C for 3 minutes, 
the reaction was brought to 37^C for 10 minutes, during which time Taq DNA polymerase 
and all four gamma-modified dNTPs (100 uM, each) were added. The reaction was (^cled 
25 times at 96''C for 10 seconds, 3TC for 1 minute, and 70^C for 5 minutes. Lane 8 includes 
nonpurified, 10-base, ^^P -labeled primer that was incubated with double-stranded DNA 
plasmid at 96^C for 3 minutes, the reaction was brought to 37^C for 1 0 minutes, during which 
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time Taq DNA polymerase and all four natural dNTPs (100 uM, each) were added. The 
reaction was cycled 25 times at 96°C for 10 seconds, 2>TC for 1 minute, and 70^C for 5 
minutes. Lane 9 includes nonpurified, 10-base, ^^P -labeled primer that was incubated with 
double-stranded DNA plasmid at 96°C for 3 mmutes, the reaction was brought to 37**C for 
10 minutes, during which time Taq DNA polymerase and all four gamma-modified dNTPs 
were added. The reaction was cycled 25 times at 96**C for 10 seconds, 37^C for 1 minute, and 
70T for 5 minutes. 

The majority of extension products in this reaction are several hundred bases long for 
both natural and y-modified dNTPs, and a significant percentage of these products are too 
large to enter the gel. Thus, demonstrating the gamma phosphate tagged dNTPs are used by 
Taq polymerase to generate long DNA polymers that are non-tagged or native DNA polymer 
chains. 

Dpfferent Polymerases React Differently to the Gamma-modified Nucleotides 

The indicated enzyme {Taq DNA Polymerase, Sequenase, HIV-1 Reverse 
Transcriptase, T7 DNA Polymerase, Klenow Fragment, Pfu DNA Polymerase) were 
incubated in the manufacturers suggested reaction buffer, 50 yM of tiie indicated nucleotide 
at 37''C for 30 - 60 minutes, and tiie reaction products were analyzed by size separation 
through a 20% denaturing gel. 

Taq DNA polymerase eflBcientiy uses the gamma-modified nucleotides to synttiesize 
extended DNA polymers at increased accuracy as shown in Figure 4-6. 

The Klenow firagment from £. coli DNA polymerase I eflBcientiy uses ttie gamma- 
modified nucleotides, but does not exhibit the extreme fidelity improvements observed with 
other enzymes as shown in Figure 8. 

Pfu DNA polymerase does not eflBcientiy use gamma-modified nucleotides and is, 
thus, not a preferred enzyme for the smgle-molecule sequencing system as shown in Figure 
9. 

HTV-l reverse transcriptase efficientiy uses flie gamma-tagged nucleotides, and 
significant fidelity improvement results as shown in Figure 10. 

Polymerization activity is difficuh to detect in the reaction products generated by 
native T7 DNA polymerase (due to tiie presence of tiie enzymes exonuclease activity). 
However, its genetically modified derivative, Sequenase, shows fliat the gamma-modified 
nucleotides are efficientiy incorporated, and tiiat mcorporation fidelity is improved, relative 
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to non-modified nucleotides. The experimental resxilts for native T7 DNA polymerase and 
Sequenase are shown in Figure 1 1 . 

Thus, for the Taq polymerase or the HTV 1 reverse transcriptase, improved fidelity, due 
to the use of the gamma-modified dNTPs of this invention, enables single-molecule DNA 
sequencing. However, not all polymerases equally utilize the ganuna-modifed nucleotides 
of this invention, specifically, Klenow, Sequenase, HIV-1 reverse transcriptase and Taq 
polymerases incorporate the modifed nucleotides of this invention, while the Pfii DNA 
polymerase does not appear to incorporate the modified nucleotides of this invention. 
Improved PGR - Generation of Long DNA Sequences 

The fidelity of nucleic acid synthesis is a limiting factor in achieving amplification 
of long target moleciiles using PGR. The misincorporation of nucleotides during the 
synthesis of primer extension products limits tiie lengtib of target tfiat can be efficiently 
amplified. The effect on primer extension of a 3*-terminal base that is mismatched with the 
template is described in Huang et al., 1992, NucL Acids Res. 20:4567-4573, incorporated 
herein by reference. The presence of misincoiporated nucleotides may result in prematurely 
terminated strand synthesis, reducmg the mmiber of template strands for future rounds of 
amplification, and thus reducing the efGciency of long target amplification. Even low levels 
of nucleotide misincorporation may become critical for sequences longer than 10 kb. The 
data shown in Figure 4 shows that the fidelity of DNA synthesis using gamma tagged dNTPs 
is unproved for the native Taq polymerase making longer DNA extension possible without 
the need for adding polymerases with 3'-to5' exonuclease, or "proofreading", activity as 
required in the long-distance PGR method developed by Gheng et al., U.S. Pat. Nos. 
5,512,462, incorporated herein by reference. Thus, the present invention provides an 
improved PGR system for generating increased extension length PGR amplified DNA 
products comprising contacting a native Taq polymerase with gamma tagged dNTPs of this 
invention under PGR reaction conditions. The extended length PGR products are due to 
improved accuracy of base incorporation, resulting firom the use of the gamma-modified 
dNTPs of fliis invention. 

Signal Intensity and Reaction Kinetics Provide Information Con cerning Base Identity 
Signal intensities for each nucleotide in the extended DNA strand are used to 
determine, confirm or support base identity data. Referring now to Figure 12, the solid line 
corresponds to reaction products produced when the four natural nucleotides (dATP, dGTP, 
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dGTP and dTTP) are included in the synthesis reaction. The dashed or broken line 
corresponds to reaction products produced when proprietary, base-modified nucleotides are 
included in the reaction. As is clearly demonstrated, sequence context and base 
modiHcation(s) influence reaction product intensity and/or kinetics, and these identifying 
patterns are incorporated into proprietary base-calling software to provide a high confidence 
value for base identity at each sequenced position. 

All references cited herein and Usted in are incorporated by reference. While this 
invention has been described fidly and completely, it should be undenrtood that, within the 
scope of the appended claims, the invention may be practiced otijerwise tiian as specifically 
described. Altiiough tiie invention has been disclosed witii reference to its preferred 
embodiments, firom reading tiiis description tiiose of skill in tiie art may appreciate changes 
and modification tiiat may be made which do not depart from tiie scope and spirit of tiie 
invention as described above and claimed hereafter. 
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CLAI M g 

We claim: 

L A composition comprising a polymerizing agent including at least one molecular 
and/or atomic tag located at or near, associated with or covalently bonded to a site on the 
polymerizing agent, where a detectable property of the tag undergoes a change before, during 
and/or after monomer incorporation. 

2. The composition of claim 1, wherein the detectable property has a first value when 
the polymerizing agent is in a first state and a second value when the polymerase is in a 
second state, and where the polymerizing agent changes firom the first state to the second state 
and back again during each monomer incorporation. 

3. The composition of claim 2, wherein the polymerizing agent is a polymerase or 
reverse transcriptase. 

4. The composition of claim 3, wherein the polymerase is selected fi:om the group 
consisting of Taq DNA polymerase I, T7 DNA polymerase, Sequenase, and tiie Klenow 
fragment firom £ coli DNA polymerase 1. 

5. The composition of claim 3, wherein the reverse transcriptase comprises HIV-1 
reverse transcriptase. 

6. The composition of claim 3 , wherein the polymerase comprises Taq DNA polymerase 
I having a tag attached at a site selected from the group consisting of 513-518, 643, 647, 649 
and 653-661 and mixtures or combinations tiiereof of the Taq polymerase, where the tag 
comprises a fluorescent molecule* 

7. A composition comprising apolymerase or reverse transcriptase including at least one 
molecular and/or atomic tag located at or near, associated with or covalentiy bonded to a site 
on the polymerase, where a detectable property has a first value when the polymerase is in 
a first state and a second value when the polymerase is in a second state during monomer 
incorporation, and where the polymerizing agent changes firom the first state to the second 
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state and back again during each monomer incorporatioa 

8. The composition of claim 7, wherein the polymerase is selected from the group 
consisting of Tag DNA polymerase I, T7 DNA polymerase, Sequenase, and the Klcnow 
fragment from K coli DNA polymerase L 

9. The composition of claim 7, wherem the reverse transcriptase comprises HTV-l 
reverse transcriptase. 



10. A composition comprising a polymerizmg agent mcludmg a molecular and/or atomic 
tag associated with or covalently bonded to a site on the polymerase and a monomer 
including a molecular and/or atomic tag, where at least one of the tags has a detectable 
property that undergoes a change before, during and/or after monomer incorporation due to 
an interaction between the polymerizing agent tag and flie monomer tag. 

11. The composition of claim 10, wherein the change in the detectable property results 
from a change in the conformation of the polymerase from a first conformational state to a 
second conformational state and back again during each monomer incorporation. 

12. The composition of claim 10, wherein the detectable property has a first detection 
propensity when the polymerase is in the first conformational state and a second detection 
propensity when the polymerase is in the a second conformational state. 

13. The composition of claim 12, wherein the polymerizing agent is a polymerase or 
reverse transcriptase. 

14. The composition of claun 13, wherein the polymerase is selected from the group 
consisting of Taq DNA polymerase I, T7 DNA polymerase. Sequenase, and the Klenow 
fragment from E. coli DNA polymerase I. 

15. The composition of claim 13, wherein the reverse transcriptase comprises HIV-1 
reverse transcriptase. 
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1 6. The composition of claim 1 2, wherein the monomer comprise a dNTP and the tag is 
covalently bonded to the p or y phosphate group. 

17. The composition of claim 10, wherein the tag comprises a fluorescent tag and the 
detectable property comprises an intensity and/or frequency of emitted light 

1 8. The composition of claim 1 6, wherein the detectable property is substantially active 
when the polymerase is in the first conformational state and substantially inactive when the 
polymerase is in the second conformational state or substantially inactive when the 
polymerase is in the first conformational state and substantially active when the polymerase 
is in the second conformational state. 

19. The composition of claim 14, wherein the polymerase comprises Tag DNA 
polymerase I having a tag attached at a site selected from the group consisting of SI 3-518, 
643, 647, 649 and 653-661 and mixtures or combinations thereof of the Tag polymerase, 
where the tag comprises a fluorescent molecule. 

20. A composition comprising a polymerase or reverse transcriptase including a pair of 
tags located at or near, associated with or covalently bonded to a site of the polymerase, 
where a detectable property of at least one of the tags undergoes a change before, during 
and/or after monomer incorporation. 

2 1 . The composition of claim 20, wherein the detectable property has a first value when 
the polymerase is in a first state and a second value when the polymerase is in a second state, 
and where the polymerizing agent changes firom the first state to the second state and back 
again during each monomer incorporation. 

22. The composition of claim, 21, wherein the polymerase is selected from the group 
consisting of Tag DNA polymerase I, T7 DNA polymerase, Sequenase, and the Klenow 
fi:agment fipom £. coli DNA polymerase L 
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23. The composition of claim 21, wherein the reverse transcriptase comprises MV-l 
reverse transcriptase. 

24. The composition of claim 22, wherein the polymerase comprises Taq DNA 
polymerase I having a tag attached at a site selected fix)m the group consisting of 513-518, 
643, 647, 649 and 653-661 and mixtures or combinations thereof of the Taq polymerase, 
where the tag comprises a fluorescent molecule. 

24. Asinglemoleculesequencingapparatuscon^risingasubstratehavingafirstchamber 
in which at least one tagged polymerase is confined therein and a second chamber including 
tagged dNTPs and a channel interconnecting the chambers, where a detectable property of 
at least one tag undergoes a detectable change during a monomer incorporation cycle, 

25. The apparatus of claims 24, further comprising a plurality of monomer chambera, one 
for each tagged dNTP. 

26. A mutant Taq polymerase comprising native Taq polymerase with a cysteine residue 
replacement at a site selected fix>m the group consisting of 5 1 3-5 1 8, 643, 647, 649 and 653- 
661 and mixtures or combinations thereof 

27. The polymerase of claim 27, wherein the cysteine residue includes a tag covalentiy 
bonded thereto through the SH groiq>. 

28. A ^stem for retrieving stored information comprising: 

a unknown nucleotide sequence representing a data stream; 

asingle-moleculesequencerincludingapolymerasehavingatagassociatedtherewith 
and monomers for the polymerase, each monomer having a tag associated therewith; 
an excitation source adapted to excite the at least one of the tags; and 
a detector adapted to detect a response fix>m at least one of the ta& 
where the response changes during polymerization of a complementary sequence and 
die changes in response represent a content of the data stream. 
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1 29. A system for determining sequence information from a single molecule comprising: 

2 a unknown nucleotide sequence; 

3 a single-molecule sequencer comprising a polymerase having a tag associated 

4 therewith and monomers for the polymerase, each monomer having a tag associated 

5 therewith; 

6 a excitation source adapted to excite at least one of the tags; and 

7 a detector adapted to detect a response from at least one of the tags, 

8 where the response changes during polymerization of a complementary sequence and 

9 the changes in the response represent the identity of each nucleotide in the unknown 

0 sequence. 

1 30. A method for sequencing a molecular sequence comprising: 

2 supplying an unknown sequence of nucleotides or nucleotide analogs to a single- 

3 molecule sequencer comprising a polymerase having a fluorescent donor covalently attached 

4 thereto and monomers for ih& polymerase, each monomer having a unique fluorescent 

5 acceptor covalently bonded thereto; 

6 exciting the fluorescent donor with a light from an excitation Ught source; 

7 detecting emitted fluorescent light from the acceptor during a monomer incorporation 

8 cycle via a fluorescent light detector, where an intensity and/or frequency oftheeniitted light 

9 for the acceptors changes during each monomer incorporation cycle; and 

0 converting the changes into an identity of each nucleotide or nucleotide analog in the 

1 unknown sequene. 

1 3 1 . A method of sequencing an individual nucleic acid molecule or numerous individual 

2 molecules in parallel including the steps of: 

3 immobilizing a member of the replication complex comprising a polymerase including 

4 a tag attached thereto, a primer or a template sufficiently spaced apart to allow resolution 

5 detection of each complex on a solid support; 

6 incubating the replication complex with cooperatively-tagged nucleotides, each 

7 nucleotide including a unique tag at its gamma-phosphate, where each nucleotide can be 

8 individually detected; 

9 detecting each nucleotide incorporated by the polymerase as the polymerase 
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transitions between its open and closed form, which causes a change in a detectable property 
of at least one of the tags or as the pyrophosphate group is released by the polymerase; and 

relating the changes in the detectable property to the sequence of nucleotides in an 
unknown nucleic acid sequence. 



32 



A Y-phosphate modified nucleoside comprising y-phosphate modified dATP, dCTP, 
dGTPanddTTP. 



33. Aprimersequenceorportionthereofselectedfix>mthegroupconsistingofSequence 
1 through 29. 
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Primer Strand; 






TOP 5' 


GGT ACT AAG CGG CCG CAT G 3 * 




Template Strands: 






BOT-T 3' 


CCA TGA TTC GCC GGC GTA CTC 5' 




BOT-C 3* 


CCA TGA TTC GCC GGC GTA CCC 5' 




BOT-G 3* 


CCA TGA TTC GCC GGC GTA CGC 5 * 




BOT-A 3* 


CCA TGA TTC GCC GGC GTA CAC 5' 




B0T-3T 3* 


CCA TGA TTC GCC GGC GTA CTT TC 


5* 


BOT-Sau 3' 


CCA TGA TTC GCC GGC GTA CCT AG 


5' 



Incorporate : 
(5" to 3') 



GATC AO AAAS 




5 base eactezision 
4 base eactenslon 
3 base eacte&sion 
2 base extensioa 
1 base extensioa 
19 base "TOP" 



Lane 1 2 3 4 5 6 7 8 S 11 13 

10 12 
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Klenow Taq 

Enzyme . + + + + + + + 

Primer (TOP) + + + + + ++ +++ + + 

Template - BOT-3T BOT -T BOT-Sau BOT-3T 



Nudeotide 
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♦ Primer Strand: 

Top 5' GGT ACT AAG 

• Template Strands: 

3T 3' CCA TGA TTC 
Sau 3' CCA TQA TTC 



CGG CCG CAT G 3' 



GCC GGC GTA CTT TC 5' 
GCC GGC GTA CCT AG 5 ' 
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Primer Strand; 

Top 5' GGT ACT AAG 

Teitiplate Strands: 

B0T-3T 3' CCA TGA 
BOT-Sau 3' CCA TGA 



CGG CCG CAT G 3' 

TTC GCC GGC GTA CTT TC 5' 
TTC GCC GGC GTA CCT AG 5' 
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SEQUENCE LISTING 

SEQUENCE LISTING 

(1) GENERAL INFORMATION: 

(iii) NUMBER OF SEQUENCES: 23 

(2) INFORMATION FOR SEQ ID N0:1: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: bases 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Synthetic DNA 
(ix) FEATURE: 

(A) NAME/KEY: Tag Poll for 

(B) LOCATION: 

(D) OTHER INFORMATION: Used to amplify full- 
length Tag Pol I coding sequence, 
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:1: 

Tag Pol I forward 

5'-gc gaattc atgaggggga tgctgcccct ctttgagccc-3 ' 

(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: bases 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Synthetic DNA 
(ix) FEATURE: 

(A) NAME/KEY: Tag Pol I rev 

(B) LOCATION: 

(D) OTHER INFORMATION: Used to amplify full- 
length Tag Pol I coding sequence, 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

Tag Pol I reverse 

5 ' -gc gaattc accctccttgg cggagcgc cagtcctccc-3 ' 

(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: bases 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Synthetic DNA 
(ix) FEATURE: 

(A) NAME/KEY: Tag Pol I_A293_trunk 

(B) LOCATION: 

(D) OTHER INFORMATION: used to amplify 
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truncated form of Tag DNA pel I codinq 
sequence ^ 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 

Tag Pol I_A293_trunk 

5 • -aatccatgggccctggaggaggc cccctggcccccgc-3 • 

. (2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: bases 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Synthetic DNA 
(ix) FEATURE: 

(A) NAME/KEY: Tag Pol I Ala643Cys fwd 

(B) LOCATION: 

(D) OTHER INFORMATION: used to introduce cys 
mutation into Tag DNA pol I coding sequence, 
(xa) SEQUENCE DESCRIPTION: SEQ ID NO: 4 1 

Alanine 643 to Cysteine Replacement 
Tag Pol I_Ala643Cys_fwd 

5'-C CAC ACG GAG ACC tgC AGC TGG ATG TTC GGC G-3 • 

(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: bases 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Synthetic DNA 
(ix) FEATURE: 

(A) NAME/KEY: Tag Pol I Ala643Cys rev 

(B) LOCATION: ~ 

(D) OTHER INFORMATION: used to introduce cys 
mutation into Tag DNA pol I coding sequence, 
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:5r 

Tag Pol I_Ala643Cys_rev 

5'-C GCC GAA CAT CCA CGA Gca GGT CTC CGT GTG G-3' 

(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: bases 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Synthetic DNA 
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(ix) FEATURE: 

(A) NAME/KEY: Tag Pol I_Phe647Cys_f wd 

(B) LOCATION: 

(D) OTHER INFORMATION: used to introduce cys 
mutation into Tag DNA pol I coding sequence, 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 

Phenylalanine 647 to Cysteine Replacement 
Tag Pol I_Phe647Cys_fwd 

5'-CC GCC AGC TGG ATG TgC GGC GTC CCC CGG GAG GCC-3' 

(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1119 bases 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Synthetic DNA 
(ix) FEATURE: 

(A) NAME/KEY: Tag Pol I_Phe647Cys_rev 

(B) LOCATION: 

(D) OTHER INFORMATION: pro used to introduce 
cys mutation into Tag DNA pol I coding 
sequence . 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 

Tag Pol I_Phe647Cys_rev 

5' -GGC CTC CCG GGG GAC GCC GcA CAT CCA CGT GGC GG-3' 

(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: bases 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Synthetic DNA 
(ix) FEATURE: 

(A) NAME/KEY: Tag Pol I_Val649Cys_fwd 

(B) LOCATION: 

(D) OTHER INFORMATION: used to introduce cys 
mutation into Tag DNA pol I coding sequence, 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 

Valine 649 to Cysteine Replacement 
Tag Pol I_Val649Cys_fwd 

5' -GCC AGC TGG ATG TTC GGC tgC CCC CGG GAG GCC GTG G-3 ' 

(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: bases 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Synthetic DNA 
(ix) FEATURE: 

(A) NAME/KEY: Taq Pol I Val649Cys rev 

(B) LOCATION: 

(D) OTHER INFORMATION: used to introduce cys 
mutation into Taq DNA pol I coding sequence, 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 

Taq Pol I_Val649Cys_rev 

5'-C CAC GGC CTC CCG GGG Gca GCC GAA CAT CCA GCT GGC-3' 

(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: bases 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Synthetic DNA 
(ix) FEATURE: 

(A) NAME/KEY: Taq Pol I Glu652Cys fwd 

(B) LOCATION: 

(D) OTHER INFORMATION: used to introduce cys 
mutation into Tag DNA pol I coding sequence, 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 

Glutamic Acid 652 to Cysteine Replacement 
Taq Pol I_Glu652Cys_fwd 

5' -GGC GTC CCC CGG tgc GCC GTG GAC CCC CTG ATG CGC-3' 

(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: • 

(A) LENGTH: bases 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Synthetic DNA 
(ix) FEATURE: 

(A) NAME/KEY: Taq Poll Glu652Cys rev 

(B) LOCATION: 

(D) OTHER INFORMATION: used to introduce cys 
mutation into Taq DNA pol I coding sequence. 
(XI) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 



Tag PolI__Glu652Cys_rev 

5'-GCG CAT CAG GGG GTC CAC GGC gca CCG GGG GAC GCC-3' 
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(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: bases 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Synthetic DNA 
(ix) FEATURE: 

(A) NAME/KEY: Taq Pol I_Ala653Cys_fwd 

(B) LOCATION: 

(D) OTHER INFORMATION: used to introduce cys 
mutation into Tag DNA pol I coding sequence, 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 

Alanine 653 to Cysteine Replacement 
Taq Pol I_Ala653Cys_fwd 

5'-GGC GTC CCC CGG GAG tgC GTG GAC CCC CTG ATG CGC-3' 

(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: bases 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Synthetic DNA 
(ix) FEATURE: 

(A) NAME/KEY: Taq Pol I_Ala653Cys_rev 

(B) LOCATION: 

(D) OTHER INFORMATION: used to introduce cys 
mutation into Taq DNA pol I coding sequence. 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 

Taq Pol I_Ala653Cys_rev 

5'-GCG CAT CAG GGG GTC CAC Gca CTC CCG GGG GAC GCC-3' 

(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: bases 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Synthetic DNA 
(ix) FEATURE: 

(A) NAME/KEY: Taq Pol I_Val654Cys_fwd 

(B) LOCATION: 

(D) OTHER INFORMATION: used to introduce cys 
mutation into Taq DNA pol I coding sequence, 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 
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Valine 654 to Cysteine Replacement 
Tag Pol I_Val654Cys_fwd 

5'-GTC CCC CGG GAG GCC tgt GAC CCC CTG ATG CGC-3' 

(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: bases 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Synthetic DNA 
(IX) FEATURE: 

(A) NAME/KEY: Tag Poll Val654Cys rev 

(B) LOCATION: ~ j _ cv 

muLtioTi '.'^T^^^^'N^ to introduce cys 

"^^^ation into Tag DNA pol I coding seouence 
(XI) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 

Tag PolI_Val654Cys_rev 

5'-GCG CAT CAG GGG GTC aca GGC CTC CCG GGG GAC-3' 

(2) INFORMATION FOR SEQ ID N0:16- 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: bases 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Synthetic DNA 
(IX) FEATURE: 

(A) NAME/KEY: Tag Pol I D655C fwd 

(B) LOCATION: ~ " 

(D) OTHER INFORMATION: used to introduce cys 

(xi) Sn?e iE^?Jp?.oT III L^S:^^-^"- 

5' -CCC CGG GAG GCC GTG tgC CCC CTG ATG CGC CGG-3' 

(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: bases 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Synthetic DNA 
(IX) FEATURE: 

(A) NAME/KEY: Tag Pol I D655C rev 

(B) LOCATION: ~ ~ 
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(D) OTHER INFORMATION: used to introduce cys 
mutation into Tag DNA pol I coding sequence, 
(xl) SEQUENCE DESCRIPTION: SBQ ID NO: 17: 

Tag Pol I_D655C_rev 

5'-CCG GCG CAT CAG GGG Gca CAC GGC CTC CCG GGG-3' 

(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: bases 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Synthetic DNA 
(ix) FEATURE: 

(A) NAME/KEY: Tag Pol I_Pro656Cys_fwd 

(B) LOCATION: 

(D) OTHER INFORMATION: used to introduce cys 
mutation into Tag DNA pol I coding sequence, 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 

Proline 656 to Cysteine Replacement 
Tag Pol I_Pro656Cys_fwd 

5'-CGG GAG GCC GTG GAC tgC CTG ATG CGC CGG GCG-3' 

(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: bases 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Synthetic DNA 
(ix) FEATURE: 

(A) NAME/KEY: Tag Pol I_Pro656Cys_rev 

(B) LOCATION: 

(D) OTHER INFORMATION: used to introduce cys 
mutation into Tag DNA pol I coding sequence, 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 

Tag Pol I_Pro656Cys_rev 

5' -CGC CCG GCG CAT CAG Gca GTC CAC GGC CTC CCG- 3' 

(2) INFORMATION FOR SEQ ID NO: 20: 
(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: bases 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Synthetic DNA 
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(ix) FEATURE: 

(A) NAME/KEY: Taq Pol I Leu657Cys fwd 

(B) LOCATION: ~ 

(D) OTHER INFORMATION: used to introduce cys 
/ eoif^^°" ^^*3r DNA pol I coding sequence, 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 

Leucine 657 to Cysteine Replacement 
Taq Pol I_Leu657Cys_fwd 

5'-GCC GTG GAC CCC tgc ATG CGC CGG GCG GCC-3' 

(2) INFORMATION FOR SEQ ID NO: 21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: bases 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Synthetic DNA 
(ix) FEATURE: 

(A) NAME/KEY: Taq Pol I Leu657Cys rev 

(B) LOCATION: 

(D) OTHER INFORMATION: used to introduce cys 
mutation into Tag DNA pol I coding sequence, 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 

Taq Pol I_Leu657Cys_rev 

5'-GGC CGC CCG GCG CAT gca GGG GTC CAC GGC-3' 

(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: bases 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Synthetic DNA 
(ix) FEATURE: 

(A) NAME/KEY: Taq Pol I Met658Cys fwd 

(B) LOCATION: 

(D) OTHER INFORMATION: used to introduce cys 

. '^ ^""^^ I coding sequence. 

(XX) SEQUENCE DESCRIPTION: SEQ ID NO: 22: 

Methionine 658 to Cysteine Replacement 
Taq Pol I_Met658Cys_fwd 

5'-GCC GTG GAC CCC CTG tgt CGC CGG GCG GCC-3' 

(2) INFORMATION FOR SEQ ID NO: 23: 
(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: bases 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Synthetic DNA 
(ix) FEATURE: 

(A) NAME /KEY: Taq Pol I_Met658Cys_rev 

(B) LOCATION: 

(D) OTHER INFORMATION: used to introduce cys 
mutation into Taq DNA pol I coding sequence, 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23: 

Tag Pol I_Met658Cys_rev 

5'-GGC CGC CCG GCG aca CAG GGG GTC CAC GGC-3' 

(2) INFORMATION FOR SEQ ID NO: 24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: bases 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Synthetic DNA 
(ix) FEATURE: 

(A) NAME/KEY: Taq Pol I_Arg659Cys_fwd 

(B) LOCATION: 

(D) OTHER INFORMATION: used to introduce cys 
mutation into Tag DNA pol I coding sequence, 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:24: 

Arginine 659 to Cysteine Replacement 
Tag Pol I_Arg659Cys_fwd 

5'-GCC GTG GAC CCC CTG ATG tGC CGG GCG GCC AAG ACC-3' 

(2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: bases 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Synthetic DNA 
(ix) FEATURE: 

(A) NAME/KEY: Taq Pol I_Arg659Cys_rev 

(B) LOCATION: 

(D) OTHER INFORMATION: used to introduce cys 
mutation into Tag DNA pol I coding sequence, 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25: 

Tag Pol I_Arg659Cys_rev 

5'-GGT CTT GGC CGC CCG GCa CAT CAG GGG GTC CAC GGC-3' 
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(2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: bases 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Synthetic DNA 
(ix) FEATURE: 

(A) NAME/KEY: Taq Pol I_Arg660Cys fwd 

(B) LOCATION: 

(D) OTHER INFORMATION: used to introduce c 
mutation into Taq DNA pol I coding sequence.' 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26: 

Arginine 660 to Cysteine Replacement 
Taq Pol I_Arg660Cys_fwd 

5'-GAC CCC CTG ATG CGC tGc GCG GCC AAG ACC ATC-3 ' 

(2) INFORMATION FOR SEQ ID NO: 27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: bases 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Synthetic DNA 
(ix) FEATURE: 

(A) NAME/KEY: Taq Pol I_Arg660Cys rev 

(B) LOCATION: 

(D) OTHER INFORMATION: used to introduce c^ 
mutation into Taq DNA pol I coding sequence." 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27: 

Taq Pol I_Arg660Cys_rev 

5' -GAT GGT CTT GGC CGC gCa GCG CAT CAG GGG GTC-3' 

(2) INFORMATION FOR SEQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: bases 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Synthetic DNA 
(ix) FEATURE: 

(A) NAME/KEY: Taq Pol I Ala661Cys fwd 

(B) LOCATION: 

(D) OTHER INFORMATION: used to introduce cv 
mutation into Taq DNA pol I coding sequence 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:28- 
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Alanine 661 to Cysteine Replacement 
Tag Pol I_Ala661Cys_fwd 

5'-CCC CTG ATG CGC CGG tgc GCC AAG ACC ATC AAC-3' 

(2) INEX3RMATI0N FOR SEQ ID NO: 29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: bases 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Synthetic DNA 
(ix) FEATURE: 

(A) NAME/KEY: Tag Pol I_Ala661Cys_rev 

(B) LOCATION: 

(D) OTHER INFORMATION: used to introduce cys 
mutation into Taq DNA pol I coding sequence, 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29: 

Tag Pol I_Ala661Cys__rev 

5'-GTT GAT GGT CTT GGC gca CCG GCG CAT CAG GGG-3' 

(2) INFORMATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: 

(ii) MOLECULE TYPE: polypeptide region of Taq DNA 
polymerase 

(ix) FEATURE: 

(A) NAME/KEY: 

(B) LOCATION: 

(D) OTHER INFORMATION: polypeptide region of 
Taq DNA polymerase containing mutation for 
tag attachment, 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30: 

Cys Ser Trp Met Phe Gly Val Pro Arg Glu Ala Val Asp Pro Leu 
Met 

643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 
658 

Arg Arg Ala 
659 660 661 

(2) INFORMATION FOR SEQ ID NO: 31: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 
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(D) TOPOLOGY: 

(ii) MOLECULE TYPE: polypeptide region of Tag DNA 

polymerase 
(ix) FEATURE: 

(A) NAME/KEY: 

(B) LOCATION: 

(D) OTHER INFORMATION: polypeptide region of 
Tag DNA polymerase containing mutation for 
tag attachment, 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31: 

Ala Ser Trp Met Cys Gly Val Pro Arg Glu Ala Val Asp Pro Leu 
Met 

643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 
658 

Arg Arg Ala 

659 660 661 

(2) INFORMATION FOR SEQ ID NO: 32: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: 

(ii) MOLECULE TYPE: polypeptide region of Tag DNA 
polymerase 

(ix) FEATURE: 

(A) NAME/KEY: 

(B) LOCATION: 

(D) OTHER INFORMATION: polypeptide region of . 
Tag DNA polymerase containing mutation for 
tag attachment, 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32: 

Ala Ser Trp Met Phe Gly Cys Pro Arg Glu Ala Val Asp Pro Leu 
Met 

643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 

658 

Arg Arg Ala 
659 660 661 

(2) INFORMATION FOR SEQ ID NO: 33: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: 

(ii) MOLECULE TYPE: polypeptide region of Tag DNA 
polymerase 
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(ix) FEATURE: 

(A) NAME/KEY: 

(B) LOCATION: 

(D) OTHER INFORMATION: polypeptide region of 
Taq DNA polymerase containing mutation for 
tag attachment, 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 33: 

Ala Ser Trp Met Phe Gly Val Pro Arg Cys Ala Val Asp Pro Leu 
Met 

643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 
658 

Arg Arg Ala 
659 660 661 

(2) INFORMATION FOR SEQ ID NO: 34: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: 

(ii) MOLECULE TYPE: polypeptide region of Taq DNA 

polymerase 
(ix) FEATURE: 

(A) NAME/KEY: 

(B) LOCATION: 

(D) OTHER INFORMATION: polypeptide region of 
Taq DNA polymerase containing mutation for 
tag attachment, 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34: 

Ala Ser Trp Met Phe Gly Val Pro Arg Glu Cys Val Asp Pro Leu 
Met 

643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 
658 

Arg Arg Ala 

659 660 661 

(2) INFORMATION FOR SEQ ID NO:35: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: 

(ii) MOLECULE TYPE: polypeptide region of Taq DNA 
polymerase 

(ix) FEATURE: 

(A) NAME/KEY: 

(B) LOCATION: 
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(D) OTHER INFORMATION: polypeptide region of 
Tag DNA polymerase containing mutation for 
tag attachment, 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35: 

Ala Ser Trp Met Phe Gly Val Pro Arg Glu Ala Cys Asp Pro Leu 
Met 

643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 
658 

Arg Arg Ala 
659 660 661 



(2) INFORMATION FOR SEQ ID NO: 36: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: 

(ii) MOLECULE TYPE: polypeptide region of Tag DNA 
polymerase 

(ix) FEATURE: 

(A) NAME/KEY: 

(B) LOCATION: 

(D) OTHER INFORMATION: polypeptide region of 
Tag DNA polymerase containing mutation for 
tag attachment, 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36: 

Ala Ser Trp Met Phe Gly Val Pro Arg Glu Ala Val Cys Pro Leu 
Met 

643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 
658 

Arg Arg Ala 
659 660 661 



(2) INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: 

(ii) MOLECULE TYPE: polypeptide region of Tag DNA 
polymerase 

(ix) FEATURE: 

(A) NAME/KEY: 

(B) LOCATION: 

(D) OTHER INFORMATION: polypeptide region of 
rag DNA polymerase containing mutation for 
tag attachment. 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37: 

Ala Ser Trp Met Phe Gly Val Pro Arg Glu Ala Val Asp Cys Leu 

Met 

643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 
658 

Arg Arg Ala 
659 660 661 

(2) INFORMATION FOR SEQ ID NO: 38: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: 

(ii) MOLECULE TYPE: polypeptide region of Tag DNA 
polymerase 

(ix) FEATURE: 

(A) NAME /KEY: 

(B) LOCATION: 

(D) OTHER INFORMATION: polypeptide region of 
Tag DNA polymerase containing mutation for 
tag attachment, 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 38: 

Ala Ser Trp Met Phe Gly Val Pro Arg Glu Ala Val Asp Pro Cys 
Met 

643 644 645 646 647 648 649 650 651 652 6*53 654 655 656 657 

658 

Arg Arg Ala 
659 660 661 

(2) INFORMATION FOR SEQ ID NO: 39: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: 

(ii) MOLECULE TYPE: polypeptide region of Taq DNA 
polymerase 

(ix) FEATURE: 

(A) NAME/KEY: 

(B) LOCATION: 

(D) OTHER INFORMATION: polypeptide region of 
Taq DNA polymerase containing mutation for 
tag attachment, 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39: 

Ala Ser Trp Met Phe Gly Val Pro Arg Glu Ala Val Asp Pro Leu 
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Cys 

643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 
658 

Arg Arg Ala 
659 660 661 

(2) INFORMATION FOR SEQ ID NO: 40: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: 

(ii) MOLECULE TYPE: polypeptide region of Tag DNA 
polymerase 

(ix) FEATURE: 

(A) NAME/KEY: 

(B) LOCATION: 

(D) OTHER INFORMATION: polypeptide region of 
Tag DNA polymerase containing mutation for 
tag attachment, 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 40: 

Met "^^^ ^"^^ ^^'^ ^"""^ ^^"^ 

643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 
658 

Cys Arg Ala 

659 660 661 

(2) INFORMATION FOR SEQ ID NO: 41: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: 

(ii) MOLECULE TYPE: polypeptide region of Taq DNA 
polymerase 

(ix) FEATURE: 

(A) NAME/KEY: 

(B) LOCATION: 

(D) OTHER INFORMATION: polypeptide region of 
Taq DNA polymerase containing mutation for 
tag attachment, 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 41: 

Ala Ser Trp Met Phfe Gly Val Pro Arg Glu Ala Val Asp Pro Leu 
Met 

•643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 

658 
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Arg Cys Ala 
659 660 661 

(2) INFORMATION FOR SEQ ID NO: 42: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: 

(ii) MOLECULE TYPE: polypeptide region of Tag DNA 
polymerase 

(ix) FEATURE: 

(A) NAME/KEY: 

(B) LOCATION: 

(D) OTHER INFORMATION: polypeptide region of 
Tag DNA polymerase containing mutation for 
tag attachment, 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 42: 

Ala Ser Trp Met Phe Gly Val Pro Arg Glu Ala Val Asp Pro Leu 
Met 

643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 

658 

Arg Arg Cys 
659 660 661 

(2) INFORMATION FOR SEQ ID NO: 43: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: 

(ii) MOLECULE TYPE: polypeptide region of Tag DNA 
polymerase 

(ix) FEATURE: 

(A) NAME/KEY: 

(B) LOCATION: 

(D) OTHER INFORMATION: polypeptide region of 
Tag DNA polymerase containing mutation for 
tag attachment, 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 43: 

Cys Thr Ser Ala Ala Val 
513 514 515 516 517 518 

(2) INFORMATION FOR SEQ ID NO: 44: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 

(B) TYPE: amino acid 
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(C) STRANDEDNESS: single 

(D) TOPOLOGY: 

(ii) MOLECULE TYPE: polypeptide region of Tag DNA 

polymerase 
(ix) FEATURE: 

(A) NAME/KEY: 

(B) LOCATION: 

(D) OTHER INFORMATION: polypeptide region of 
Tag DNA polymerase containing mutation for 
tag attachment, 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 44: 

Ser Cys Ser Ala Ala Val 
513 514 515 516 517 518 

(2) INFORMATION FOR SEQ ID NO: 45: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: 

(ii) MOLECULE TYPE: polypeptide region of Tag DNA 
polymerase 

(ix) FEATURE: 

(A) NAME /KEY: 

(B) LOCATION: 

(D) OTHER INFORMATION: polypeptide region of 
Tag DNA polymerase containing mutation for 
tag attachment, 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 45: 

Ser Thr Cys Ala Ala Val 
513 514 515 516 517 518 

(2) INFORMATION FOR SEQ ID NO: 46: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: 

(ii) MOLECULE TYPE: polypeptide region of Tag DNA 
polymerase 

(ix) FEATURE: 

(A) NAME/KEY: 

(B) LOCATION: 

(D) OTHER INFORMATION: polypeptide region of 
Tag DNA polymerase containing mutation for 
tag attachment, 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:46: 
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Ser Thr Ser Cys Ala Val 
513 514 515 516 517 518 

(2) INFORMATION FOR SEQ ID NO: 47: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: 

(ii) MOLECULE TYPE: polypeptide region of Taq DNA 
polymerase 

(ix) FEATURE: 

(A) NAME/KEY: 

(B) LOCATION: 

(D) OTHER INFORMATION: polypeptide region of 
Taq DNA polymerase containing mutation for 
tag attachment, 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 47: 

Ser Thr Ser Ala Cys Val 
513 514 515 516 517 518 

(2) INFORMATION FOR SEQ ID NO: 48: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: 

(ii) MOLECULE TYPE: polypeptide region of Taq DNA 
polymerase 

(ix) FEATURE: 

(A) NAME/KEY: 

(B) LOCATION: 

(D) OTHER INFORMATION: polypeptide region of 
Taq DNA polymerase containing mutation for 
tag attachment, 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 48: 



Ser Thr Ser Ala Ala Cys 
513 514 515 516 517 518 
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FURTHER INFORMATION CONTINUED FROM PCT/ISA/ 210 



1. Claims: 1-4,6-8,27 

A composition comprising a polymerising agent Including at 
least one molecular and/or atomic tag located at or near, 
associated with or covalently bonded to a site on the 
polymerising agent, where a detectable property of the tag 
undergoes a change before, during and/or after monomer 
Incorporation, according to claims 1 and 2, and wherein the 
polymerising agent Is a polymerase according to claims 
3,4,6-8, and the mutant Taq polymerase wherein the cystalne 
residue Includes a tag according to claim 27. 



2. Claims: 1-3,5,7,9 

A composition comprising a polymerising agent Including at 
least one molecular and/or atomic tag located at or near, 
associated with or covalently bonded to a site on the 
polymerising agent, where a detectable property of the tag 
undergoes a change before, during and/or after monomer 
Incorporation, according to claims 1 and 2, and wherein the 
polymerising agent Is a reverse transcriptase according to 
claims 3,5,7 and 9. 



3. Claims: 10-19, 24*, 25, 28-31 

A composition comprising a polymerising agent including a 
molecular and/or atomic tag associated with or covalently 
bonded to a site on the polymerase and a monomer including a 
molecular and/or atomic tag, wherein at least one of the 
tags has a detectable property that undergoes a change 
before, during and/or after monomer incorporation due to an 
Interaction between the polymerising agent tag and the 
monomer tag according to claims 10-19. 
A single molecule sequencing apparatus comprising a first 
chamber In which at least one tagged polymerase is confined 
therein and a second chamber Including tagged dNTPs and a 
channel interconnecting the chambers, where a detectable 
property of at least one tag undergoes a change during a 
monomer Incorporation cycle, according to claims 24* and 25. 
A system for retrieving stored information and determining 
sequence information according to claims 28 and 29 and a 
method of sequencing according to claims 30 and 31, wherein 
the systems and the methods are characterised by the use of 
the tagged polymerase and the tagged monomers as above. 



24* - It should be noted that the set of claims contained In 
the present international application comprises two 
different claims 24. The first "claim 24" Is directed to the 
composition of claim 22, and 1s concerned with a the later 
subject 4. The second "claim 24" is directed to a single 
molecule sequencing apparatus and Is concerned with the 
present subject 3. 
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FURTHER INFORMATION COHTINUeD FROM PCTASA/ 210 



4. Claims: 20-24* 

A composition comprising a polymerase or reverse 
transcriptase Including a pair of tags located at or near, 
associated with or covalently bonded to a site of the 
polymerase, wherein a detectable property of at least one of 
the tags undergoes a change before, during and/or after 
monomer Incorporation according to claims 20-24* 

24* - It should be noted that the set of claims contained in 
the present International application comprises two 
different claims 24. The first "claim 24" Is directed to the 
composition of claim 22, and Is concerned with a the present 
subject 4. The second "claim 24" Is directed to a single 
molecule sequencing apparatus and is concerned with the 
earlier subject 3. 



5. Claim : 26 

A mutant Tag polymerase comprising native Tag polymerase 
with a cysteine residue replacement at a site selected 
according to claim 26. 



6. Claim : 32 

A gamma-phosphate modified nucleoside according to claim 32. 



7. Claim : 33 

A primer sequence or portion thereof according to claim 33. 
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